ArticlePDF Available

Interval estimation for the difference between independent proportions: Comparison of eleven methods

Authors:

Abstract

Several existing unconditional methods for setting confidence intervals for the difference between binomial proportions are evaluated. Computationally simpler methods are prone to a variety of aberrations and poor coverage properties. The closely interrelated methods of Mee and Miettinen and Nurminen perform well but require a computer program. Two new approaches which also avoid aberrations are developed and evaluated. A tail area profile likelihood based method produces the best coverage properties, but is difficult to calculate for large denominators. A method combining Wilson score intervals for the two proportions to be compared also performs well, and is readily implemented irrespective of sample size.
*Correspondence to: Robert G. Newcombe, Senior Lecturer in Medical Statistics, University of Wales College of
Medicine, Heath Park, Cardiff CF4 4XN, U.K.
CCC 02776715/98/08087318$17.50 Received May 1995
(1998 John Wiley & Sons, Ltd. Revised July 1997
STATISTICS IN MEDICINE
Statist.Med.17, 873890 (1998)
INTERVAL ESTIMATION FOR THE DIFFERENCE
BETWEEN INDEPENDENT PROPORTIONS: COMPARISON
OF ELEVEN METHODS
ROBERT G. NEWCOMBE*
Senior Lecturer in Medical Statistics, University of Wales College of Medicine, Heath Park, CardiCF4 4XN, U.K.
SUMMARY
Several existing unconditional methods for setting confidence intervals for the difference between binomial
proportions are evaluated. Computationally simpler methods are prone to a variety of aberrations and poor
coverage properties. The closely interrelated methods of Mee and Miettinen and Nurminen perform well but
require a computer program. Two new approaches which also avoid aberrations are developed and
evaluated. A tail area profile likelihood based method produces the best coverage properties, but is difficult
to calculate for large denominators. A method combining Wilson score intervals for the two proportions to
be compared also performs well, and is readily implemented irrespective of sample size. (1998 John Wiley
& Sons, Ltd.
1. INTRODUCTION
Interval estimation for proportions and their differences encounters two problems that cannot
arise in the continuous case: intervals that do not make sense, termed aberrations; and a coverage
probability (achieved confidence level) that can be quite different to the intended nominal 1!a.
Vollset1and Newcombe2are recent comparative evaluations of different available methods for
the single proportion.
Similar issues apply to the difference between two proportions, a particularly important
situation, arising naturally in prospective comparative studies such as the randomized controlled
clinical trial. Unfortunately, standard statistical software including Minitab, SPSS and SAS, and
even StatXact, has nothing to offer the user. Hence, by default, the computationally simplest
asymptotic methods continue to be used, despite their known poor coverage characteristics and
propensity to aberrations.
Table I shows the notation adopted for the comparison of two independent binomial propor-
tions. It is assumed that the denominators mand nare fixed, leading to unconditional methods.
Appendix I gives methods for the ratio of two proportions, which assume a different conditioning,
namely m#nfixed.
Table I. Notation for comparison of two independent proportions
Observed frequencies:
Sample
12
#ab
!cd
Total mn
Theoretical proportions: Observed proportions:
EA/m"n1p1"a/m
EB/n"n2.p2"b/n.
Here Aand Bdenote random variables of which aand bare realizations.
Reparameterization:
Parameter of interest h"n1!n2
Nuisance parameter t"(n1#n2)/2.
Beal3reviewed and evaluated several asymptotic unconditional methods see Section 2 for
explicit formulae. All of these involve identifying the interval within which (h!hK)2)z2»(t
I,hI)
where hK"a/m!b/n, and »(t,h)"uM4t(1!t)!h2N#2l(1!2t)h"n1(1!n1)/
m#n2(1!n2)/nis the variance of hK,u"1
4(1/m#1/n), l"1
4(1/m!1/n), and zis the standard
Normal deviate associated with a two-tailed probability a. Here t
Iand hIdenote hypothetical
values of tand h. The simple asymptotic method involves substitution of MLEs, t
I"t
Kand
hI"hK. This may be improved upon in several ways using hI"hand solving for hwhich is
analogous to the method of Wilson4for the single proportion. A simple alternative class of
estimators of this form involves setting hI"hand t
Ias a Bayes posterior estimate of t. Beal
examined two resulting methods, termed the Haldane and JeffreysPerks methods; both per-
formed generally better than the simple asymptotic method, and of the two, that of JeffreysPerks
was preferable. These methods are, however, prone to certain novel anomalies, especially latent
overshoot, as described later.
Beal also evaluated the closely interrelated methods of Mee5and Miettinen and Nurminen6
which are based on, but superior to, Anbar.7Here hI"hand t
I"th, the profile estimate of
tgiven h, that is, the MLE of tconditional on the hypothesized value of h. The form of this
described in Appendix II, distinguishing the four cases NZ (no zero cells), OZ (one zero), RZ (two
zeros in the same row) and DZ (two zeros on the same diagonal). The MiettinenNurminen
method involves imputing to hKa variance which is (m#n)/(m#n!1) times as large as the
expression for »above. Recently Wallenstein8published a closely related non-iterative method
with similar coverage.
Miettinen and Nurminen also6considered a true profile likelihood method, involving
Mh:ln"(h,t
h
)!ln "(hK,t
K)*!z2/2N, where "denotes the likelihood function. They concluded it
was theoretically inferior, for reasons set out by Cox and Reid;9it is not amenable to continuity
correction to mitigate its anti-conservatism.
An alternative approach involving precisely computed tail probabilities based on hand thwas
found greatly superior to existing methods for the paired difference case,10 and is included in the
present evaluation also. This approach leads naturally to consideration of a pair of methods.
SIM 779
874 R. NEWCOMBE
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
A so-called ‘exact’ definition of tail probabilities aims to align the minimum coverage probability
(CP) with 1!a. Alternatively, ‘mid-p’ tail areas11~13 represent the attempt to achieve a mean
coverage of 1!a.
The simple asymptotic method, without and with the continuity correction, and the methods
described in the above four paragraphs constitute methods 1 to 9 of the 11 methods considered in
the present evaluation; in general, there is a progressive improvement in performance from the
simple asymptotic method 1 to method 9, at the cost of greatly increased computational
complexity. Nevertheless, the tail area profile likelihood methods 8 and 9, which are the most
complex of any evaluated here, but which as we will show have the best coverage and location
properties, display a novel anomaly. Suppose a,mand the ratio p2"b/nare held constant, while
nPR through values which keep binteger valued. We would expect that a good method for
a/m!b/nwould produce a sequence of intervals, each nested within its predecessor, tending
asymptotically towards some corresponding interval for the single proportion, shifted by the
constant p2. Yet these methods give a sequence of lower limits which increase up to a certain n,
but subsequently decrease, violating the above consideration. Evaluation of just what the lower
limit is converging towards is computationally prohibitive, but there is clearly an anomaly here.
Now, it is clear that the simple method’s asymptotic behaviour is appropriate in this respect.
The anomaly can only arise because the reparameterization (h,t) leads to disregard of the fact
that a/mand b/nare independently sufficient statistics for n1and n2, respectively. A simple
combination of single-sample intervals for a/mand b/n, which avoids the deficiencies of methods
1 and 2, is thus worth considering, indeed would seem to correspond most closely to the chosen
conditioning on mand nonly. We may combine Wilson4score intervals (without or with
continuity correction) for each single proportion in much the same way as the simple method is
constructed. The resulting methods avoid all aberrations.
2. METHODS COMPARED
Eleven unconditional methods were selected for comparison. Only methods 1 to 4 are capable of
violating the [!1, #1] boundaries, in which case the resulting interval is truncated. Methods
1 to 4 and 10 to 11 involve direct computation, methods 5 to 9 are iterative, of which methods
8 and 9 are the most complex. For notation see Table I, and Appendix II for the form of th:
1. Simple asymptotic method, no continuity correction:
hK$zJ(ac/m3#bd/n3).
2. Simple asymptotic method, with continuity correction (reference 14, p. 29):
hK$(zJMac/m3#bd/n3N#(1/m#1/n)/2).
3. Beal’s Haldane method:3limits are h*$wwhere
h*"hK#z2l(1!2t
I)
1#z2u
w"z
1#z2uJ[uM4t
I(1!t
I)!hK2N#2l(1!2t
I)hK#4z2u2(1!t
I)t
I#z2l2(1!2t
I)2]
t
I"(a/m#b/n)/2, u"(1/m#1/n)/4 and l"(1/m!1/n)/4.
SIM 779
INTERVAL ESTIMATION FOR DIFFERENCE BETWEEN INDEPENDENT PROPORTIONS 875
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
4. Beal’s JeffreysPerks method:3as above, but with
t
I"1
2Aa#0)5
m#1#b#0)5
n#1B.
5. Method of Mee:5the interval is
Gh:DhK!hD)zSCjG(th#h/2) (1!th!h/2)
m#(th!h/2)(1!th#h/2)
nHDH
where j"1.
6. Method of Miettinen and Nurminen (reference 6, equations 8 and 9 with Wilson-form
variance): as method 5, but j"(m#n)/(m#n!1).
7. True profile likelihood method (reference 6, appendix III): the interval consists of all
hsatisfying
aln th#h/2
a/m#bln th!h/2
b/n#cln 1!th!h/2
c/m#dln 1!th#h/2
d/n*!z2
2
omitting any terms corresponding to empty cells.
8. Profile likelihood method based on ‘exact’ tail areas: interval is ¸)hsuch that
(i) if ¸)h)hK,kPx#+
1
*
m;x
Pm*a
2
(ii) if hK)h,kPx#+
~1
)
m:x
Pm*a
2
where Pm"Pr[A/m!B/n"mDh,th], x"a/m!b/nand k"1.
9. Profile likelihood method based on ‘mid-p’ tail areas: as method 8, but with k"0)5.
10. Method based on the Wilson4score method for the single proportion, without continuity
correction:
¸"hK!d,º"hK#ewhere
d"JM(a/m!l1)2#(u2!b/n)2N"zJMl1(1!l1)/m#u2(1!u2)/nN
e"JM(u1!a/m)2#(b/n!l2)2N"zJMu1(1!u1)/m#l2(1!l2)/nN
l1and u1are the roots of Dn1!a/mD"zJMn1(1!n1)/mN, and l2and u2are the roots of
Dn2!b/nD"zJMn2(1!n2)/nN.
11. Method using continuity-corrected score intervals (reference 14, pp. 1314): as method 10,
but l1and u1delimit the interval
Mn1:Dn1!a/mD!1/(2m))zJ[n1(1!n1)/m]N.
Note that if a"0, l1"0; if c"0, u1"1. Similarly l2and u2delimit the interval
Mn2:Dn2!b/nD!1/(2n))zJ[n2(1!n2)/n]N.
Table II shows the eleven methods applied to eight selected combinations of a,m,band n,
representing all of cases NZ, OZ, RZ and DZ. Clearly when all four cell frequencies are large, as in
example (a) (reference 14, p. 101), all methods produce rather similar intervals. Choice between
SIM 779
876 R. NEWCOMBE
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
Table II. 95 per cent confidence intervals for selected contrasts, calculated using eleven methods. Asterisked values denote aberrations (limits beyond $1or
inappropriately equal to hK)
Contrast
Method (a) (b) (c) (d) (e) (f) (g) (h)
56/7048/80 9/103/10 6/72/7 5/560/29 0/100/20 0/100/10 10/100/20 10/100/10
1. Asympt, no CC #0)0575, #0)3425 #0)2605, #0)9395 #0)1481, #0)9947 #0)0146, #0)1640 0)0000*,0)0000*0)0000*,0)0000*#1)0000*,#1)0000 #1)0000*,#1)0000
2. Asympt, CC #0)0441, #0)3559 #0)1605, '#1)0000*#0)0053, '#1)0000*!0)0116, #0)1901 !0)0750, #0)0750 !0)1000, #0)1000 #0)9250, '#1)0000*#0)9000, '#1)0000*
3. Haldane #0)0535, #0)3351 #0)1777, #0)8289 #0)0537, #0)8430 !0)0039, #0)1463 0)0000*,#0)0839 0)0000*,0)0000*#0)7482, #1)0000 #0)6777, #1)0000
4. JeffreysPerks #0)0531, #0)3355 #0)1760, #0)8306 #0)0524, #0)8443 !0)0165, #0)1595 !0)0965, #0)1746 !0)1672, #0)1672 #0)7431, '#1)0000*#0)6777, #1)0000
5. Mee #0)0533, #0)3377 #0)1821, #0)8370 #0)0544, #0)8478 !0)0313, #0)1926 !0)1611, #0)2775 !0)2775, #0)2775 #0)7225, #1)0000 #0)6777, #1)0000
6. Miettinen
Nurminen #0)0528, #0)3382 #0)1700, #0)8406 #0)0342, #0)8534 !0)0326, #0)1933 !0)1658, #0)2844 !0)2879, #0)2879 #0)7156, #1)0000 #0)6636, #1)0000
7. True profile #0)0547, #0)3394 #0)2055, #0)8634 #0)0760, #0)8824 #0)0080, #0)1822 !0)0916, #0)1748 !0)1748, #0)1748 #0)8252, #1)0000 #0)8169, #1)0000
8. ‘Exact’ #0)0529, #0)3403 #0)1393, #0)8836 !0)0104, #0)9062 !0)0302, #0)1962 !0)1684, #0)3085 !0)3085, #0)3085 #0)6915, #1)0000 #0)6631, #1)0000
9. ‘Mid-p#0)0539, #0)3393 #0)1834, #0)8640 #0)0470, #0)8840 !0)0233, #0)1868 !0)1391, #0)2589 !0)2589, #0)2589 #0)7411, #1)0000 #0)7218, #1)0000
10. Score, no CC #0)0524, #0)3339 #0)1705, #0)8090 #0)0582, #0)8062 !0)0381, #0)1926 !0)1611, #0)2775 !0)2775, #0)2775 #0)6791, #1)0000 #0)6075, #1)0000
11. Score, CC #0)0428, #0)3422 #0)1013, #0)8387 !0)0290, #0)8423 !0)0667, #0)2037 !0)2005, #0)3445 !0)3445, #0)3445 #0)6014, #1)0000 #0)5128, #1)0000
CC: continuity correction
SIM 779
methods is more critical when the numbers are smaller, as in cases (b) to (h). The degree of
concordance with hypothesis testing is limited, not surprisingly, as the conditioning is different:
constrast (d) (Goodfield et al.,15 cited by Altman and Stepniewska16) exemplifies the anti-
conservatism of methods 1 and 7, whilst (c) suggests methods 8 and 11 are conservative. Overt
overflow (see below) can occur with methods 2 and 4 and indeed also method 1. Limits
inappropriately equal to hKcan occur with methods 1 and 3.
3. CRITERIA FOR EVALUATION
The present evaluation presupposes the principles set out in detail by Newcombe.2In brief,
a good method will avoid all aberrations and produce an appropriate distribution of coverage
probabilities. The coverage probability2CP is defined as Pr[¸)h] where ¸and ºare the
calculated limits. The ‘exact’ criterion requires CP*1!afor all points in the parameter space,
but by the smallest attainable margin. We interpret2the ‘mid-p’ criterion to imply that for any
mand nthe mean coverage probability CP is to be close to but not below 1!a, and min0:h:1
CP should not be too far below 1!a, for chosen mand nor for any mand n. All methods
evaluated aim to have a/2 non-coverage at each end, except in boundary cases (hK"$1). In
recognition of the importance of interval location we examine symmetry as well as degree of
coverage. When, as here, all methods are equivariant,17 that is, show appropriate properties of
symmetry about 0 when p1and p2are interchanged or replaced by 1!p1or 1!p2, equality
of left and right non-coverage as hranges from !1to#1 is gratuitous. It is more pertinent to
distinguish probabilities of non-coverage at the mesial (closer to 0) and distal (closer to $1) ends
of the interval. Here, as with the single proportion, the relationship of coverage to the parameter
value shows many discontinuities (see, for example, graphs in Vollset1). To minimize the potential
for distortion as a result of this, hand tare randomly sampled real numbers, not rationals, and
similarly we avoid use of selected, round values of mand n.
The ‘exactness’ claimed for any method can only relate to its mode of derivation, and does not
carry across to the achieved coverage probabilities for specific combinations of m,n,hand t.Itis
necessary to evaluate these for a representative set of points in the parameter space. It is also
important to examine the location of ¸and ºin relation to each other, to boundaries that should
not be violated (!1 and #1), and to boundaries that should be violable (that is, avoidance of
inappropriate tethering to hK, as defined below).
For any method of setting a confidence interval [¸,º] for h"n1!n2, with ¸)hK, two
properties are considered desirable:
(i) Appropriate coverage and location: ¸)hshould occur with probability 1!aand
¸'hand º(heach with probability a/2.
(ii) Avoidance of aberrations, defined as follows.
Several kinds of aberrations can arise, principally point estimate tethering and overshoot. Tether-
ing occurs when one or both of the calculated limits ¸and ºcoincides with the point estimate hK.
In the extreme case, where hK"#1, it is appropriate that º"hK, likewise that ¸"hKwhen
hK"!1. Otherwise this is an infringement of the principle that the CI should represent some
‘margin of error’ on both sides of hK, and is counted as adverse. Bilateral point estimate tethering,
¸"hK, constitutes a degenerate or zero-width interval (ZWI) and is always inappropriate.
Point estimate tethering can only occur in case RZ (two zeros in the same row) for methods
1 and 3, and in case DZ (two zeros on the same diagonal), exemplified in Table II, contrasts (e) to
SIM 779
878 R. NEWCOMBE
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
(h). The Haldane method produces appropriate, unilateral tethering in case DZ. In case RZ, it
produces unilateral tethering if mOn, and a ZWI at hK"0ifm"n, both of which are inappropri-
ate. Method 1 produces a totally inappropriate ZWI at 0 in case RZ, and a ZWI at #1 (or !1) in
case DZ, for which unilateral tethering would be appropriate. In case DZ, the JeffreysPerks
method reduces to the Haldane method if m"n, otherwise produces overt overshoot.
Overt overshoot (OO) occurs when either calculated limit is outside [!1, #1]; º"#1 is not
counted as aberrant when hK"#1, and correspondingly at !1. Methods 1 to 4 are liable to
produce OO, in which case we truncate the resulting interval to be a subset of [!1, #1];
instances in which OO would otherwise occur are counted in the evaluation.
Furthermore, methods 3 and 4 substitute an estimate t
Ifor twhich is formed without reference
to h, and very often one or two of the implied parameters
nJ1L"t
I/2, nJ2L"t
I/2, nJ1U"t
I/2, nJ2U"t
I/2
lie outside [0, 1]. This anomaly is termed latent overshoot (LO); overt overshoot always implies
latent overshoot, but latent overshoot can occur in the absence of overt overshoot, when inherent
bounds for hare not violated but the bounding rhombus 1
2DhD)t)1!1
2DhD is. The formulae
still work, and do not indicate anything peculiar has occurred. In this evaluation the frequency of
occurrence of any LO (irrespective of whether involving one or two implied parameters, and of
whether OO also occurs) is obtained for methods 3 and 4, using the chosen a"0)05. Unlike overt
overshoot, latent overshoot cannot effectively be eliminated by truncation; as well as affecting
coverage, such truncation can produce inappropriate point estimate tethering,or even an interval
that excludes hK. Latent overshoot and its sequelae, like the inappropriate asymptotic behaviour of
methods 8 and 9, appears a consequence of losing the simplicity of using information concerning
n1and n2separately.
4. EVALUATION OF THE ELEVEN METHODS
The main evaluation of coverage is based on a sample of 9200 parameter space points (PSPs)
(m,n,t,h), with mand nbetween 5 and 50 inclusive. This approach is adopted for reasons set out
in a preceding article.2The computer-intensive part of the process is the setting up of ‘tables’ of
intervals for each (m,n) pair for the iterative methods, especially 8 and 9. Therefore a subset of 230
out of the 2116 possible pairs was selected (Figure 1), comprising all 46 diagonal entries with
m"n, together with 92 pairs (m,n) with mOnand the corresponding reversed pairs (n,m) which
require the same tables. For each of m"5, 6 , 2, 50, two values of nwere chosen, avoiding
diagonal elements, duplicates and mirror-image pairs. These were selected so that mand nshould
be uncorrelated and that distributions of Dm!nDand the highest common factor (HCF) of mand
nshould be very close to those for all 2070 off-diagonal points. The rationale for examining the
HCF is that when tail areas are defined in terms of a/m!b/n, the difference between ‘mid-p’ and
‘exact’ limits could be great when m"nbut relatively small when mand nare coprime. The
configuration shown, the result of an iterative search, has mand nuncorrelated (Pearson’s
r"!0)00006), and KolmogorovSmirnov statistics for Dm!nDand HCF 0)011 and 0)012,
respectively. Thus the chosen set of off-diagonal (m,n) combinations may be regarded as
representative, in all important respects, of all 2070 possible ones; these are used together with
a deliberate over-representation of diagonal pairs, in view of their commonness of occurrence, to
give a set of (m,n) pairs which may be regarded as typical.
SIM 779
INTERVAL ESTIMATION FOR DIFFERENCE BETWEEN INDEPENDENT PROPORTIONS 879
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
Figure 1. 230 (m,n) pairs chosen for the main evaluation
For each of the 230 (m,n) pairs, 40 (t,h) pairs were chosen, with h"jM1!D2t!1DN and
tand j(0, 1), all sampling being random and independent using algorithm AS183.18 This
resulted in a set of hvalues with median 0)193, quartiles 0)070 and 0)393. For each chosen point
(m,n,t,h) of the parameter space, frequencies of all possible outcomes were determined as
products of binomial probabilities. The achieved probability of coverage of hby nominally 95 per
cent confidence intervals calculated by each of the 11 methods was computed by summating all
appropriate non-negligible terms. Mean coverage was also examined for subsets of the parameter
space defined according to HCF, min(m,n), minimum expected frequency, tand hin turn.
Minimum coverage probabilities were also examined. Mesial and distal non-coverage rates were
computed similarly, as were incidences of overt overshoot for methods 1 to 4, and latent
overshoot at 1!a"0)95 for methods 3 and 4. Probabilities of inappropriate tethering for
methods 1 and 3 were computed by examining frequencies of occurrence of cases RZ and DZ, in
conjunction with whether mand nwere equal. Additionally, mean and minimum coverage
probabilities for nominal 90 per cent and 99 per cent intervals for the same set of 9200 parameter
space points were calculated.
To check further the effect of restricting to the chosen (m,n) pairs, coverage of 95 per cent
intervals by methods 10 and 11 only was evaluated on a further set of 7544 parameter space
points, 4 for each of the 1886 unused (m,n) pairs, with tand hsampled as above.
A third evaluation examined the coverage of 95 per cent intervals by methods 10 and 11
only, when applied to the comparison of proportions with very large denominators but small
to moderate numerators. 1000 parameter space points were chosen, with log10 mand
SIM 779
880 R. NEWCOMBE
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
log10 n(2, 5), and log10(4mn1) and log10(4nn2)(0, 2), all sampling being independent
and random.
In the first two evaluations, his positive, and left and right non-coverage are interpreted as
mesial and distal non-coverage, respectively, with probabilities denoted here by MNCP and
DNCP. Thus MNCP"+
Ma,b:l'hN
pab, and DNCP"+
Ma,b:u(hN
pab, where land uare the cal-
culated limits corresponding to observed numerators aand b, and pab"Pr[A"a,B"bDh,t].
In the third evaluation, as in the intended application, hmay be of either sign, and mesial and
distal non-coverage were imputed accordingly.
Expected interval width was calculated exactly for 95 per cent intervals by each method,
truncated to lie within [!1, #1] where necessary, for n1"n2"0)5or0)01 with mand n10 or 100.
5. RESULTS
Table III shows that in the main evaluation the coverage probability of nominal 95 per cent
intervals, averaged over the 9200 parameter space points, ranged from 0)881 (method 1) to 0)979
(method 11). In addition to method 1, method 3 was also anti-conservative on average, and
method 7 slightly so; method 8 was slightly conservative, whereas other methods had appropriate
mean coverage rates.
The maximum CP of method 1 in this evaluation was only 0)9656; for all other methods some
parameter space points have CP"1. The coverage probability of method 1 is arbitrarily close to
0 in extreme cases, either MNCP can approach 1 (when t&0 or 1 and h&0) or DNCP can
(when t+0)5 and h&1), due to ZWIs at 0 and 1, respectively. The continuity correction of
method 2, though adequate to correct the mean CP, yields an unacceptably low min CP of 0)5137
in this evaluation. DNCP approaches exp(!0)5)"0)6065 with n;mbut nPR,h"1
2
(1/m#1/n)#e,n1"1!e; a similar supremum applies to MNCP. The coverage of method
2 exhibited appropriate symmetry; for method 1, overall, distal non-coverage predominated in
this evaluation.
Method 3, like method 1, can have CP arbitrarily close to 0, but only DNCP can approach 1, as
tP1 (if m)n)ortP0 (if m*n), due to inappropriate tethering at 0. Method 4 eliminates this
deficiency as well as the low mean CP, and reduces the preponderance of distal non-coverage.
Methods 5 and 6 have generally very similar coverage properties to each other, with overall
coverage similar to method 4. The coverage probability was only 0)8516 when m"42, n"7,
t"0)9752, h"0)0253, with MNCP"0)1484, DCNP"0; substantial distal non-coverage can
also occur. These methods exhibited very good symmetry of coverage.
Coverage of method 7 was symmetrical but slightly anti-conservative on average, with min
CP"0)8299 (m"48, n"23, t"0)9751, h"0)0806, MNCP"0)0344, DNCP"0)1356).
Values of either DNCP or MNCP around 0)14 can occur.
Even method 8 fails to be strictly conservative, the min CP in this evaluation being 0)9424,
when m"32, n"25, t"0)2640 and h"0)4016. Here both MNCP (0)0279) and DNCP (0)0297)
exceed the nominal a/2; there are other parameter space points for which either exceeds 0)03. (This
contrasts with the performance of the analogous method for the paired case19 for which min CP
was 0)9546, and DNCP (but not MNCP) was always less than 0)025.) The lowest coverage
obtained for its ‘mid-p’ analogue, method 9, was 0)9131, at m"n"8, t"0)4890, h"0)4705.
Both these methods yielded symmetrical coverage.
For method 10, the lowest coverage obtained in the main evaluation was 0)8673, with m"35,
n"15, t"0)5087, h"0)9645; this and other extremes arose entirely as distal non-coverage.
SIM 779
INTERVAL ESTIMATION FOR DIFFERENCE BETWEEN INDEPENDENT PROPORTIONS 881
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
Table III. Estimated coverage probabilities for 95, 90 and 99 per cent confidence intervals calculated by 11 methods. Based on 9200 points in
parameter space with 5)m)50, 5)n)50, 0(t(1 and 0(h(1!D2t!1D
Method 95% intervals 90% intervals 99% intervals
Coverage Mesial non-coverage Distal non-coverage Coverage Coverage
Mean Minimum Mean Maximum Mean Maximum Mean Minimum Mean Minimum
1. Asympt, no CC 0)8807 0)0004 0)0417 0)7845 0)0775 0)9996 0)8322 0)0004 0)9253 0)0004
2. Asympt, CC 0)9623 0)5137 0)0183 0)4216 0)0194 0)4844 0)9401 0)5137 0)9811 0)5156
3. Haldane 0)9183 0)0035 0)0153 0)0656 0)0664 0)9965 0)8696 0)0035 0)9574 0)0035
4. JeffreysPerks 0)9561 0)8505 0)0140 0)0606 0)0299 0)1418 0)9123 0)7655 0)9896 0)9083
5. Mee 0)9562 0)8516 0)0207 0)1484 0)0231 0)1064 0)9076 0)8057 0)9919 0)9470
6. MiettinenNurminen 0)9584 0)8516 0)0196 0)1484 0)0220 0)1064 0)9114 0)8057 0)9925 0)9478
7. True profile 0)9454 0)8299 0)0268 0)1440 0)0278 0)1384 0)8912 0)6895 0)9893 0)9613
8. ‘Exact’ 0)9680 0)9424 0)0149 0)0308 0)0170 0)0317 0)9305 0)8862 0)9948 0)9881
9. ‘Mid-p’0)9591 0)9131 0)0197 0)04996 0)0212 0)0470 0)9116 0)8374 0)9933 0)9847
10. Score, no CC 0)9602 0)8673 0)0134 0)0660 0)0264 0)1327 0)9162 0)8226 0)9916 0)9173
11. Score, CC 0)9793 0)9339 0)0061 0)0271 0)0147 0)0661 0)9553 0)9012 0)9957 0)9399
CC: continuity correction
SIM 779
882 R. NEWCOMBE
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
Table IV. Estimated coverage probabilities for 95 per cent confidence intervals calculated by 11 methods,
for 9200 points in parameter space (PSPs). Determinants of coverage
Method min (m,n) min expected frequency tin range hin range
593050 01525 (0)10)40)600)05 0)51
or '0)9
Number of PSPs
1720 2360 3031 1875 1835 1907 1784 1422
1. Asympt, no CC 0)8063 0)9176 0)7981 0)9385 0)7416 0)9045 0)7604 0)8959
2. Asympt, CC 0)9470 0)9685 0)9616 0)9666 0)9665 0)9530 0)9791 0)9516
3. Haldane 0)9039 0)9276 0)8648 0)9480 0)8296 0)9474 0)8536 0)9437
4. JeffreysPerks 0)9617 0)9530 0)9649 0)9492 0)9793 0)9499 0)9782 0)9490
5. Mee 0)9602 0)9527 0)9679 0)9486 0)9732 0)9507 0)9675 0)9532
6. Miettinen
Nurminen 0)9632 0)9542 0)9697 0)9505 0)9744 0)9530 0)9692 0)9553
7. True profile 0)9476 0)9460 0)9561 0)9463 0)9566 0)9440 0)9553 0)9457
8. ‘Exact’ 0)9766 0)9628 0)9833 0)9551 0)9880 0)9612 0)9798 0)9670
9. ‘Mid-p’0)9674 0)9542 0)9745 0)9489 0)9803 0)9525 0)9734 0)9566
10. Score, no CC 0)9604 0)9584 0)9737 0)9486 0)9849 0)9468 0)9751 0)9497
11. Score, CC 0)9837 0)9750 0)9877 0)9682 0)9950 0)9698 0)9889 0)9715
CC: continuity correction
(The highest MNCP was 0)0660, but more extreme mesial non-coverage, up to 0)0998, occurred
in the third evaluation with larger denominators (Table V)). MNCP can approach 0)1685,
corresponding to the limiting non-coverage for the score method for the single proportion2
when h1!e,n2"e,msmall, nPR, where ¸1is the lower score limit for 1/m. Method
11 produced its lowest CP, 0)9339, when m"n"8, t"0)5160, h"0)9233, entirely from
distal non-coverage; this compares unfavourably with 0)949 for the corresponding method
for the single proportion.2Both methods 10 and 11 yielded intervals erring towards mesial
location.
Mean coverage was very similar whether mand nwere equal, coprime or intermediate, except
that for method 3, the mean CP was 0)9225 for mOnbut 0)9015 for m"n. Table IV illustrates
the relation of mean coverage probability to other parameters: the lower of the two denominators
mand n; the lowest of the four expected frequencies mn1,m(1!n1), nn2and n(1!n2). Method 8,
11 and 6 (but not 5) were conservative on average in all zones of the parameter space examined.
As expected, for most methods the mean coverage became closer to the nominal 0)95 as min(m,n)
increased. Method 2 was anti-conservative for min(m,n) from 5 to 9, had mean CP 0)9628 for 10
to 19, which subsequently increased slightly. Method 7’s mean coverage remained close to the
overall mean of 0)9454. Method 11 remained very conservative for the larger values of mand nin
the main evaluation.
The pattern according to minimum expected cell frequency (that is, min[mn1,m(1!n1), nn2,
n(1!n2)] was generally similar, but more pronounced. Methods 2 and 7 were conservative on
average even when the lowest expected frequency was under 1. For several methods whose overall
mean coverage was over 0)95, the mean coverage dipped to slightly below 0)95 when all expected
frequencies were over 5; this applies to methods 9, 10, 4 and 5, but not 6, apparently.
SIM 779
INTERVAL ESTIMATION FOR DIFFERENCE BETWEEN INDEPENDENT PROPORTIONS 883
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
Table V. Estimated coverage probabilities for 95 per cent confidence intervals calculated by methods 10
and 11. Based on 1000 points in parameter space with 100)m)100,000, 100)n)100,000, 0(t(1
and 0(h(1!D2t!1D
Coverage Mesial non-coverage Distal non-coverage
Mean Minimum Mean Maximum Mean Maximum
10. Score, no CC 0)9618 0)9002 0)0265 0)0998 0)0116 0)0657
11. Score, CC 0)9785 0)9520 0)0153 0)0435 0)0062 0)0274
CC: continuity correction
Most methods had coverage closer to 0)95 for mesial t(0)4to0)6) than for distal t((0)1or
'0)9). Method 10 was slightly anti-conservative for 0)4(t(0)6, and also for h'0)5; method
4 was slightly anti-conservative for 0)2(t(0)8, and for h'0)25. Conversely method 7 was
conservative on average for toutwith (0)1, 0)9) and for h(0)05.
For method 1, the preponderance of distal non-coverage was gross for toutwith (0)1, 0)9)
(MNCP"0)0112, DNCP"0)2472) or h(0)05. Conversely, for large h, non-coverage was
predominantly mesial (0)5(h(1: MNCP"0)0877, DNCP"0)0164) see examples (b) and (c)
in Table II with corresponding behaviour for mesial t. Thus when his large, method 1 does not
err on the safe side, the interval fails to exclude values of hwhich are considerably too large. For
other methods, the mesial-distal location was less dependent upon hand t.
Overt overshoot was common using methods 1 (mean probability 0)0270) and 2 (0)0594), less so
for methods 3 (0)0012) and 4 (0)0052). It occurs with probability approaching 1 for method 2 as
hP1, and also for method 4 provided mOn. For method 1, the maximum overshoot probability
is lower, 0)7996, since as hP1, Pr[ZWI at 1]P1. Latent overshoot occurred with probability
around 0)5 for methods 3 and 4; some parameter space points produced latent overshoot with
probability 1.
The Haldane method produced unilateral point estimate tethering with probability 0)0413
when mOn, and a ZWI at 0 with probability 0)0471 when m"n. Method 1 produced a ZWI at
0 in both these cases, and also a ZWI at 1 with probability 0)0035. For both methods the
probability of inappropriate tethering approaches 1 as tP0or1.
Table III also shows coverage properties of 90 per cent and 99 per cent intervals, which where
generally in line with the findings for 95 per cent intervals. Method 11 was strictly conservative for
the chosen parameter space points at 90 per cent, but can be very anti-conservative at 99 per cent.
The average coverage of method 4 became anti-conservative at 99 per cent.
In the second evaluation, using (m,n) pairs not included in the first, the coverage properties of
methods 10 and 11 were virtually identical to those in the main evaluation.
In the third evaluation (Table V), with denominators ranging from 100 to 100,000, the mean
coverage was very similar to the main evaluation, but the minimum CPs were more favourable,
and contrastingly, the location of the interval was too distal. The lowest CP for method 10, 0)9002,
resulted entirely from mesial non-coverage, with m"140, n"67622, mn1"0)5349,
nn2"8)9480; the highest DNCP was only 0)0657. In the main evaluation, the location of these
intervals was too mesial, but this asymmetry disappeared in the zone h(0)05.
Variation in expected interval width (Table VI) between different methods is most marked
when any of the expected frequencies mn1,m(1!n1), nn2or n(1!n2) is low. The width is then
least for method 1 or 3, largely on account of the high frequency of degeneracy.
SIM 779
884 R. NEWCOMBE
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
Table VI. Average interval width for 95 per cent confidence intervals calculated by 11 methods, for selected
parameter space points
m
Method 10 10 10 100 100 100 100 100 100
n
10 10 10 10 10 10 100 100 100
n1
0)01 0)50)95 0)01 0)50)95 0)01 0)50)95
n2
0)01 0)50)05 0)01 0)50)05 0)01 0)50)05
1. Asympt, no CC 0)0702 0)8302 0)2407 0)0635 0)6177 0)1996 0)0493 0)2758 0)1188
2. Asympt, CC 0)2702 1)0296 0)3420 0)1735 0)7277 0)2618 0)0693 0)2958 0)1385
3. Haldane 0)0646 0)7640 0)4316 0)1819 0)5904 0)2906 0)0487 0)2732 0)1227
4. JeffreysPerks 0)3580 0)7679 0)4327 0)2624 0)5930 0)3246 0)0644 0)2732 0)1227
5. Mee 0)5634 0)7737 0)4224 0)3286 0)5529 0)3507 0)0888 0)2732 0)1225
6. Miettinen
Nurminen 0)5840 0)7910 0)4371 0)3307 0)5549 0)3526 0)0891 0)2739 0)1228
7. True profile 0)3748 0)7990 0)3440 0)2233 0)5794 0)2871 0)0664 0)2745 0)1194
8. ‘Exact’ 0)6298 0)8801 0)4661 0)3372 0)5885 0)3541 0)0919 0)2840 0)1296
9. ‘Mid-p’0)5324 0)8128 0)4075 0)2978 0)5803 0)3384 0)0800 0)2749 0)1214
10. Score, no CC 0)5627 0)7231 0)4773 0)3289 0)5430 0)3522 0)0895 0)2707 0)1264
11. Score, CC 0)6945 0)8232 0)5744 0)4036 0)6121 0)4213 0)1061 0)2843 0)1398
CC: continuity correction
6. DISCUSSION
As for the single proportion, the only virtue of the simple asymptotic method is simplicity; the
alignment of its coverage with 1!ais merely nominal, an attribute of its method of construction,
quite divorced from its attained coverage properties. Overt overshoot and inappropriate tether-
ing are common, and coverage not symmetric.
Method 2 is a great improvement in terms of degree and symmetry of coverage, but the
minimum coverage remains poor, and the improvedmean coverage is attained at the cost of a still
higher overshoot rate.
The Haldane and JeffreysPerks methods (3 and 4) attempt to overcome these deficiencies
while maintaining closed-form tractability. The Haldane method suffers from poor mean and
minimum coverage rates, severe asymmetry and proneness to aberrations. The JeffreysPerks
method is a clear improvement, but still inadequate.
Methods 5 to 11 cannot produce any of the aberrations listed in Table V. The Mee and
MiettinenNurminen methods (5 and 6) show slightly conservative mean coverage probability.
Method 7, described6as theoretically and empirically somewhat inferior to method 6, is slightly
anti-conservative; all three have a rather poor minimum coverage probability, however.
Methods 8 and 9 achieve much the performance they were designed to achieve, though method
8 as well as method 9 has a minimum coverage probability less than the nominal value. They
require lengthy source code in any case, and also large amounts of computation time except when
mand nare small.
Methods 10 and 11 are remarkable. They are computationally simpler than any of methods
5 to 9, and tractable for very large mand n. At least for 95 per cent and 90 per cent intervals, they
SIM 779
INTERVAL ESTIMATION FOR DIFFERENCE BETWEEN INDEPENDENT PROPORTIONS 885
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
attain a coverage distribution inferior only to their highly complex tail-area profile based
counterparts, methods 9 and 8, respectively; the coverage distribution for method 10 is broadly
similar to that of the fairly complex methods 5 and 6. (Method 10 is strictly less effective than
method 9, in that its mean CP is higher but min CP lower; similarly for method 11 in comparison
to method 8.) They avoid all aberrations, including the anomalous asymptotic behaviour of
methods 8 and 9 as described in Section 1.
In Newcombe,2the Wilson method for the single proportion produced an interval that was
located too mesially, with a distal non-coverage rate approximately 0)015 greater than the mesial.
This is propagated into a similar difference, in both direction and size, between left and right
non-coverage rates for method 10 in the main evaluation here. The same applies to the
corresponding continuity-corrected methods, with a distal-mesial non-coverage preponderance
approximately 0)01. The simple asymptotic method for the single proportion yielded an interval
that was too distal, corresponding to the propensity to overshoot. Nevertheless, in the present
evaluation method 1 intervals were located too mesially; the set of parameter space points was
weighted away from high h, for which location is too distal, ZWIs were common at hK"0 but rare
at hK"1. Gart and Nam20 found that a skewness correction, after Bartlett,21 led to little
improvement upon the Mee method.5
7. CONCLUSION
For the difference between independent proportions, a novel pair of methods (10 and 11) are
presented, which are computationally very tractable irrespective of mand n, free from aberrations,
and achieve better coverage properties than any except the most complex methods. In the absence
of off-the-shelf software, these methods are strongly recommended over methods 1 and 2, the only
ones commonly in use. Nevertheless, software producers are strongly urged to provide readily
available routines for appropriate methods for the unpaired difference case, as also for the paired
difference case and indeed the single proportion case.
APPENDIX I: MULTIPLICATIVE SCALE SYMMETRY OF CONDITIONAL
INTERVALS DERIVED FROM THE WILSON SCORE INTERVAL
In a preceding paper2we showed that the Wilson score interval4is symmetrical on a logit scale.
Interval estimates for certain types of ratio may be derived from intervals for proportions. These
methods are conditional in nature in that the appropriate denominator nrelates to only a subset
of the Nindividuals studied, and has itself arisen as a result of sampling. From an appropriate
interval estimate about p"r/nwe may derive a corresponding interval for any monotonic
function of it, in particular the odds r/(n!r). In the following examples, when the Wilson interval
is used for r/n, the resulting derived interval is symmetrical on the multiplicative scale.
The simplest case22~24 is the ratio of two Poisson counts, r1/r2. Armitage and Berry23
considered the ratio of two bacterial colony counts of 13 and 31. A Wilson 95 per cent interval for
the proportion 13/44 would be 0)1816 to 0)4422. This corresponds to an interval
0)1816/(1!0)1816)"0)2218 to 0)4422/(1!0)4422)"0)7928 for r1/r2, which is symmetrical
about the point estimate 0)4194 on a multiplicative scale.
Furthermore, interest sometimes centres on the ratio of two proportions derived from indepen-
dent samples, p1/p2where pi"ri/ni,i"1, 2. Analogously, two rates expressed per person-year at
SIM 779
886 R. NEWCOMBE
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
risk may be contrasted by their ratio; here the odds ratio is inapplicable as no meaning attaches to
1!pi,i"1, 2. Some existing methods are summarized by Miettinen and Nurminen6and
Rothman.25
A simple method, appropriate for use when both proportions or rates are very small, may
likewise be derived from an interval estimate for a single proportion. For example, 41 cases of
breast carcinoma developed in a series25,26 exposed to radiation, with 28,010 woman-years at
risk, 1)856 times the rate in controls which was 15/19,017. We may base the calculation on any
suitable interval for the proportion (41/56) of cases that arose in the exposed group. A Wilson
interval for this proportion would be 0)604 to 0)830. The corresponding limits for the odds (41 :15)
are 1)526 and 4)897, which are then divided by the ratio of the denominators to yield limits 1)036
and 3)325 for the rate ratio. The method may be refined to standardize for age or other
confounders, by adjusting the ratio of the denominators.27
Similarly, in an individually matched comparative retrospective study, the odds ratio uis
estimated by f/g, the ratio of the counts of discordant pairs. If (¸,º) is a confidence interval for
f/( f#g), then ¸/(1) and º/(1) serve28 as corresponding limits for u. For example, Mills
et al.29 studied smoking habits in 106 psoriatics and individually matched controls, and obtained
f"38, g"14, uˆ"2)71. A score-based 95 per cent confidence interval for this odds ratio would
be 1)48 to 4)96.
Coverage properties of such derived intervals follow from those of the interval method used for
r/n. The ClopperPearson method30 is often used,28 and its conservatism carries across to
conditional intervals derived from it. While choice of coverage intention is more important than
symmetry, the latter is nonetheless a noteworthy property.
APPENDIX II: PROFILING OF t, THE NUISANCE PARAMETER, AS th
When applied directly via the likelihood function "(h,t)"Pr[A"a&B"bDh,t], by
obtaining
Mh :ln"(h,t
h)!ln "(hK,tK)*!z2/2N
the profile likelihood approach has attracted censure as being in general anti-conservative.9
Nevertheless the Mee and MiettinenNurminen methods for the unpaired difference, selected as
optimal by Beal,3and the optimal methods for the paired difference10,19 all involve substitution
of the appropriate profile estimate for the nuisance parameter.
For the unpaired difference, with reparameterization as in Table I, the log-likelihood reduces
(within an additive constant) to
ln ""aln(t#h/2)#bln(t!h/2)#cln(1!t!h/2)#dln(1!t#h/2)
with the understanding that terms corresponding to empty cells are omitted. The constraints
0)ni)1, i"1, 2 translate into restricting evaluation of ln "only within the bounding rhombus
1
2DhD)t)1!1
2DhD.
The four diagonal boundaries of this are: t"!h/2; h/2; 1!h/2; and 1#h/2. The likelihood is
zero on each of these boundaries, unless the corresponding cell entry (a,b,cor d, respectively) is
zero.
We distinguish four situations, according to the pattern of zero cells:
NZ: no zero cells; all four edges precipitous.
SIM 779
INTERVAL ESTIMATION FOR DIFFERENCE BETWEEN INDEPENDENT PROPORTIONS 887
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
OZ: one cell zero, for example, c"0, abd'0; upper right edge t"1!h/2 is now
available.
RZ: two cells in same row zero, for example a"b"0, cd'0; lower edges t"$h/2 are
now available.
DZ: two cells on same diagonal zero, for example, b"c"0, ad'0; right-hand edges
t"h/2 and 1!h/2 are now available.
These represent all situations allowed, as mand nmust be greater than zero.
Case NZ: abcd'0.
The log-likelihood function, regarded as a function of either hor t, is a sum of terms of the form
ln(jm#k), where mcan represent either hor t, and jO0, whence
L2
Lm2ln(jm#k)"!j2
(jm#k)2(0.
So both L2/Lt2ln"and L2/Lh2ln"are negative throughout the bounding rhombus; the
log-likelihood surface is smoothly convex, with chasms to !R at all four edges in all cases. Thus
there is a unique maximizing thfor each hwith !1(h(1, and 100(1!a) per cent profile
likelihood limits for halways exist, away from the boundary, for any a'0. L/Lt ln"takes the
form
C(t)
<(t$h/2) (1!t$h/2)
where C(t) is a cubic in t, but maximization is more readily performed iteratively. The MLE
point is hK"a/m!b/n,tK"(a/m#b/n)/2; at h"0, threduces to t0"(a#b)/(m#n), the
weighted proportion; as hP#1or!1, thP1
2.
Case OZ: c"0, abd'0 say. nL1"1'nˆ2
The attainable region here is the interior of the rhombus plus the edge t"1!h/2, 0(h(1. On
this edge,
L
Lt ln""a#b
1!h!d
h
which tends to !R as hP0 and to #R as hP1. It has a unique zero at
h"h*"a#n!JM(a#n)2!4adN
2a
where 0(h*(1.
For h*)h(1, this simply 1!h/2. For !1(h(h*,this within the rhombus and is
(B#JMB2!4ACN)/(2A) where A"a#n,B"a(1#h)#b,C"M(a!b)(1#h/2)!dh/2Nh/2.
The MLE point is hK"d/nwhere h*(hK(1, tK"1!hK/2, on the permitted edge.
SIM 779
888 R. NEWCOMBE
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
Case RZ: a"b"0, cd'0 say. nL1"nL2"hª"t
K"0
This is the most tractable case for methods involving th, though conversely the one that would
invalidate the conditional approach for interval estimation of h.
L
Lt ln")0 for all hwith 1
2DtD(h(1!1
2DtD
so the likelihood is maximized by th"1
2DhD, that is, on the two lower boundaries of the rhombus.
Thus for h'0, t"thimplies n1"h,n2"0; for h(0, n1"0, n2"!h.
Standard 95 per cent intervals for methods 7, 8 and 9 are then (!1#c1@n,1!c1@m) where
c"0)1465, 0)025 and 0)05, respectively. These limits correspond to non-zero ones for the single
proportions 0/nand 0/m(Miettinen and Nurminen,6equation 5, Clopper and Pearson30 and
Miettinen31). Methods 5 and 10 produce the limits (!z2/(n#z2), z2/(m#z2)), which correspond
to Wilson4limits for 0/nand 0/m; similarly for the analogous continuity-corrected methods.
Case DZ: a"m,b"0, say. hª"nL1!nL2"1!0"1
In this situation, a good method is expected to produce unilateral tethering. The right-hand
boundaries of the rhombus, t"h/2 and t"1!h/2, are now attainable. The form of thhere
depends on the relative sizes of mand n:
If m"n,th"1
2for all hwith !1(h(1.
Otherwise, a pair of intersecting straight lines:
If m'n,th"1!h/2 when n/m)h)1,
t
h
"m#(m!n)h/2
m#nwhen !1)h)n/m.
If m(n,th"h/2 when m/n)h)1,
t
h
"m#(m!n)h/2
m#nwhen !1)h)m/n.
ACKNOWLEDGEMENTS
I am grateful to Dr. E. C. Coles for helpful discussion, to Mr. C. M. Berners-Lee for assistance
with programming, and to two anonymous referees for many helpful suggestions.
REFERENCES
1. Vollset, S. E. ‘Confidence intervals for a binomial proportion’, Statistics in Medicine,12, 809824 (1993).
2. Newcombe, R. G. ‘Two-sided confidence intervals for the single proportion: comparison of seven
methods’, Statistics in Medicine,17, 857872 (1998).
3. Beal, S. L. ‘Asymptotic confidence intervals for the difference between two binomial parameters for use
with small samples’, Biometrics,43, 941950 (1987).
4. Wilson, E. B. ‘Probable inference, the law of succession, and statistical inference’, Journal of the American
Statistical Association,22, 209212 (1927).
5. Mee, R. W. ‘Confidence bounds for the difference between two probabilities’, Biometrics,40, 11751176
(1984).
SIM 779
INTERVAL ESTIMATION FOR DIFFERENCE BETWEEN INDEPENDENT PROPORTIONS 889
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
6. Miettinen, O. S. and Nurminen, M. ‘Comparative analysis of two rates’, Statistics in Medicine,4,
213226 (1985).
7. Anbar, D. ‘On estimating the difference between two probabilities with special reference to clinical
trials’, Biometrics,39, 257262 (1983).
8. Wallenstein, S. ‘A non-iterative accurate asymptotic confidence interval for the difference between two
proportions’, Statistics in Medicine,16, 13291336 (1997).
9. Cox, D. R. and Reid, N. ‘Parameter orthogonality and approximate conditionalinference’, Journal of the
Royal Statistical Society,Series B,49,139 (1992).
10. Newcombe, R. G. ‘Unconditional confidence interval methods for the difference between two binomial
proportions based on paired data’, 16th International Biometric Conference, Hamilton, New Zealand,
1992.
11. Lancaster, H. O. ‘The combination of probabilities arising from data in discrete distributions’, Biomet-
rika,36, 370382 (1949).
12. Stone, M. ‘The role of significance testing. Some data with a message’, Biometrika,56, 485493 (1969).
13. Berry, G. and Armitage, P. ‘Mid-P confidence intervals: a brief review’, Statistician,44, 417423 (1995).
14. Fleiss, J. L. Statistical Methods for Rates and Proportions, 2nd edn, Wiley, New York, 1981.
15. Goodfield, M. J. D., Andrew, L. and Evans, E. G. V. ‘Short-term treatment of dermatophyte
onychomycosis with terbinafine’, British Medical Journal,304, 11511154 (1992).
16. Altman, D. G. and Stepniewska, K. A. ‘Confidence intervals for small proportions, including zero’,
British Medical Journal, in submission.
17. Blyth, C. R. and Still, H. A. ‘Binomial confidence intervals’, Journal of the American Statistical
Association,78, 108116 (1983).
18. Wichmann, B. A. and Hill, I. D. ‘An efficient and portable pseudo-random number generator’, in
Griffiths, P. and Hill, I. D. (eds), Applied Statistics Algorithms, Ellis Horwood, Chichester, 1985.
19. Newcombe, R. G. ‘Improved confidence interval methods for the difference between binomial propor-
tions based on paired data’, Submitted for publication.
20. Gart, J. J. and Nam, J-M. ‘Approximate interval estimation of the difference in binomial parameters:
correction for skewness and extension to multiple tables’, Biometrics,46, 637643 (1990).
21. Bartlett, M. S. ‘Approximate confidence intervals. III. A biascorrection’, Biometrika,42, 201204 (1955).
22. Kahn, H. A. ‘The relationship of reported coronary heart disease mortality to physical activity of work’,
American Journal of Public Health,53, 10581067 (1963).
23. Armitage, P. and Berry, G. Statistical Methods in Medical Research, 2nd edn, Wiley, New York, 1987.
24. Ederer, F. and Mantel, N. ‘Confidence limits on the ratio of two Poisson variables’, American Journal of
Epidemiology,100, 165167 (1974).
25. Rothman, K. Modern Epidemiology, Little Brown, Boston, 1986.
26. Boice, J. A. and Monson, R. R. ‘Breast cancer in women after repeated fluoroscopic examination of the
chest’, Journal of the National Cancer Institute,59, 823832 (1977).
27. Gardner, M. J. and Altman, D. G. (eds). Statistics with Confidence.Confidence Intervals and Statistical
Guidelines, British Medical Journal, London, 1989.
28. Breslow, N. E. and Day, N. E. Statistical Methods in Cancer Research.1.¹he Analysis of Case-Control
Studies, IARC, Lyon, 1980.
29. Mills, C. M., Srivastava, E. D., Harvey, I. M., Swift, G. L., Newcombe, R. G., Holt, P. J. A. and Rhodes, J.
‘Smoking habits in psoriasis: a case-control study’, British Journal of Dermatology,127,1821 (1992).
30. Clopper, C. J. and Pearson, E. S. ‘The use of confidence or fiducial limits illustrated in the case of the
binomial’, Biometrika,26, 404413 (1934).
31. Miettinen, O. S. ¹heoretical Epidemiology, Wiley, New York, 1985, pp. 120121.
SIM 779
890 R. NEWCOMBE
Statist.Med.17, 873890 (1998)(1998 John Wiley & Sons, Ltd.
... The alternative methods used to calculate the CI for proportions will be Wald, Wald continuity corrected, Wald interval with an adjustment according to Agresti and Caffo, 9 Miettinen and Nurminen 10 and Newcombe or Wilson score. 11 The Wald interval is the most basic one and performs well if the proportions, for example, patients with an event, are close to 0.5 because it assumes a normal sampling distribution. If not, the coverage of the CI is poor. ...
Article
Full-text available
Background The reporting of randomised controlled non-inferiority (NI) drug trials is poor with less than 50% of published trials reporting a justification of the NI margin. This is despite the introduction of the Consolidated Standards of Reporting Trials (CONSORT) extension on reporting of NI and equivalence in randomised trials. It is critical to set the appropriate NI margin as this choice dictates the conclusions of the trial. Methods to estimate the margin are heterogeneous but generally based on clinical judgement and statistical reasoning, and hence tailored to each clinical situation. Yet an appraisal of NI in clinical trials has not been undertaken. Therefore the aim of this systematic review is to assess the reporting and methodological quality of defining the NI margin. Surgical NI trials have been chosen as our prototype to assess this. Methods We will conduct a systematic review of published randomised controlled trials in abdominal surgery that use an NI design. Key eligibility criteria will be: surgical intervention in at least one trial arm; adult patients and a sample size of 100 or more. Ovid MEDLINE, EMBASE and the Cochrane Central Register of Controlled Trials will be searched from inception until the date of the search. Identified studies will be assessed for reporting according to the CONSORT recommendations. The outcomes are the description of the methods for defining the NI margin, and the robustness of the NI margin estimation. The latter will be based on simulations using alternative assumptions for model parameters. The results of the simulation will be compared with the trial authors’ conclusions. Anticipated results The review will describe and appraise the design and reporting of surgical NI trials including shortcomings thereof and allow a comparison with pharmaceutical trials. These findings will inform researchers on the appropriate design and pitfalls when conducting surgical randomised controlled trials with an NI design and promote thorough and standardised reporting of study findings. Ethics and dissemination Ethical approval is not required and any changes to the protocol will be communicated via the registration platform. The final manuscript will be submitted to a journal for publication and the findings will be disseminated through conference presentations to inform researchers and the public.
Article
Full-text available
Objective This study aimed to compare the accuracy of adrenal vein sampling (AVS) and ⁶⁸Ga-Pentixafor positron emission tomography (PET)/computed tomography (CT) in predicting the functional outcome from adrenalectomy in patients with primary aldosteronism (PA), and to investigate the utility of ⁶⁸Ga-Pentixafor PET/CT in laparoscopic partial adrenalectomy. Methods We prospectively enrolled patients diagnosed with PA. All patients underwent ⁶⁸Ga-Pentixafor PET/CT and AVS. Patient management was discussed by a multidisciplinary team. Ninety surgically eligible patients were randomized to partial (n = 45) or total adrenalectomy (n = 45), while 97 received medical therapy. Postoperative Aldosterone Surgical Outcome (PASO) criteria served as the gold standard. Results In total, 187 patients with PA were examined using ⁶⁸Ga-pentixafor PET/CT and AVS, and 90 patients who underwent surgery were included in the analysis. The accuracy of ⁶⁸Ga-Pentixafor PET/CT in correctly predicting biomedical partial or complete success, biomedical complete success, clinical partial or complete success, and clinical complete success was 78.89%, 77.78%, 77.67%, and 67.78%, respectively. For AVS, the accuracies were 74.44%, 73.33%, 70.00%, and 54.44%, respectively. ⁶⁸Ga-pentixafor PET/CT was not significantly superior, but the differences lay within the pre-specified − 10% margin for non-inferiority. Functional outcomes did not significantly differ between the partial and total adrenalectomy groups. Multivariable logistic analysis demonstrated that the lower highest systolic blood pressure and higher SUVmax were independent factors of complete clinical success. The AUC for SUVmax in determining clinical complete success was 0.896, with an optimal cutoff value of 9.8. Subgroup analysis showed no functional outcome difference between laparoscopic partial and total adrenalectomy for patients with an SUVmax over 9.8. However, for those with an SUVmax below 9.8, laparoscopic total adrenalectomy yielded better results than laparoscopic partial adrenalectomy. Conclusion The accuracy of ⁶⁸Ga-Pentixafor PET/CT is comparable to the conventional, invasive method of AVS in forecasting the functional outcomes of adrenalectomy. SUVmax is an independent factor in determining complete clinical success and can potentially predict the functional outcome of laparoscopic adrenalectomy. It is suggested that laparoscopic partial adrenalectomy be performed on individuals who present an SUVmax value exceeding 9.8. The ability of ⁶⁸Ga-Pentixafor PET/CT to localize aldosterone-producing adenoma ultimately paves the way for the use of more focal treatment options. Trial registration ChiCTR2300070830. Registered 23 April 2023. Graphical Abstract
Article
Full-text available
Background Fractional doses of vaccine to protect against COVID-19 offer the potential to expand vaccine availability, reduce side effects, and enhance vaccination campaign efficiency. This study aimed to assess the immune response and safety of fractional doses of SARS-CoV-2 booster vaccines compared to full doses in immunocompetent adults aged 18–60 who had previously received a full series of Sinovac, AZD1222 (AstraZeneca), or BNT162b2 (Pfizer/BioNTech). Methods This trial was structured as a parallel-group, double-blind, randomised Phase IV non-inferiority study, carried out in Campo Grande, Midwest, Brazil. After obtaining consent, eligible participants were randomised to one of 5–6 study arms, depending on their priming vaccine. Participants were followed for 21–60 days after vaccination through in-person visits and remote contact for blood collection and safety evaluation. Anti-spike binding IgG antibodies were measured by ELISA. The primary outcome was the difference in seroresponse rates between the full and fractional doses, with a non-inferiority threshold of 10%. Findings A total of 1451 participants were randomised and administered booster vaccines between 5 July and 3 October, 2022. A half dose of BNT162b2 met the non-inferiority threshold, compared to a full dose in the Sinovac and AZD1222 primed groups. Sinovac induced an inferior response compared to AZD1222 and BNT162b2 full or fractional dose boosters in participants primed with Sinovac. Fractional booster doses of BNT162b2 consistently resulted in higher seroresponse rates (ranging from 35.4% to 78.3%) compared to fractional boosters of AZD1222 (ranging from 10.0% to 44.7%) or a full dose of Sinovac (4.2%). Both full and fractional dose vaccines were generally well tolerated. Local and systemic adverse events occurred across all treatment arms in line with expectations, with nine serious adverse events reported, none of which were determined to be related to study vaccination. Interpretation Our data show that the immunogenicity of booster vaccines depends on the initial vaccine, baseline antibody levels, and the booster vaccine used. Fractional doses of BNT162b2 and AZD1222 were non-inferior to a full Sinovac booster in individuals primed with Sinovac. However, fractional doses of BNT162b2 were not non-inferior in BNT162b2-primed individuals, and AZD1222 fractional doses were only non-inferior in the AZD1222 priming arm. We advise against Sinovac as a booster. Fractional doses of BNT162b2 or AZD1222 remain practical alternatives for Sinovac-primed populations in resource-limited settings. Funding 10.13039/100016302Coalition for Epidemic Preparedness Innovations (CEPI)/10.13039/100004107Sabin Vaccine Institute.
Article
Objectives To assess whether use of an artificial intelligence (AI) model for mammography could result in more longitudinally consistent breast density assessments compared with interpreting radiologists. Methods The AI model was evaluated retrospectively on a large mammography dataset including 50 sites across the United States from an outpatient radiology practice. Exams were acquired on Hologic imaging systems between 2016 and 2021 and were interpreted by 39 radiologists (36% fellowship trained; years of experience: 2-37 years). Longitudinal patterns in four-category breast density and binary breast density (non-dense vs. dense) were characterized for all women with at least three examinations (61,177 women; 214,158 examinations) as constant, descending, ascending, or bi-directional. Differences in longitudinal density patterns were assessed using paired proportion hypothesis testing. Results The AI model produced more constant (p<.001) and fewer bi-directional (p<.001) longitudinal density patterns compared to radiologists (AI: constant 81.0%, bi-directional 4.9%; radiologists: constant 56.8%, bi-directional 15.3%). The AI density model also produced more constant (p<.001) and fewer bi-directional (p<.001) longitudinal patterns for binary breast density. These findings held in various subset analyses, which minimize a) change in breast density (post-menopausal women, women with stable image-based BMI), b) inter-observer variability (same radiologist), and c) variability by radiologist’s training level (fellowship-trained radiologists). Conclusions AI produces more longitudinally consistent breast density assessments compared with interpreting radiologists. Advances in knowledge Our results extend the advantages of AI in breast density evaluation beyond automation and reproducibility, showing a potential path to improved longitudinal consistency and more consistent downstream care for screened women.
Article
Full-text available
Background Attention deficit hyperactivity disorder (ADHD) and behavioral addictions (BAs) are highly comorbid but little is known about the effect of anti-ADHD medications on behavioral addiction symptoms. Thus, the aim of this naturalistic prospective study was to investigate the long-term changes on BAs symptoms among methylphenidate-treated adults with a primary diagnosis of ADHD. Methods 37 consecutive adult ADHD outpatients completed a baseline and follow-up assessment of ADHD, mood and BAs symptoms (internet, shopping, food, sex addictions and gambling disorder) after one year of methylphenidate (flexible dose) treatment. Results Internet addiction test scores pre-treatment were significantly higher than post-treatment scores ( p < 0.001). The same trend was seen for the shopping addiction ( p = 0.022), food addiction scores ( p = 0.039) and sex addiction scores ( p = 0.047). Gambling disorder scores did not differ pre and post treatment since none of the included patients reported significant gambling symptoms at baseline. The rate of ADHD patients with at least one comorbid BA was reduced after methylphenidate treatment (51.4% vs 35.1%). The correlation analyses showed a moderate positive correlation between the changes in sluggish cognitive tempo symptoms, cognitive impulsivity, mood and anxiety symptoms and changes in internet addiction symptoms. Conclusions This is the first study showing that after one-year of treatment with methylphenidate, adult ADHD patients show a significant reduction on internet, food, shopping and sex addiction symptoms. Further controlled studies with larger samples should replicate these preliminary results and elucidate the role of methylphenidate and other moderator factors (such as concomitant psychological treatments or lifestyle habits changes) on BAs improvements.
Article
Full-text available
The integration of quantitative trait loci (QTLs) with disease genome-wide association studies (GWASs) has proven successful in prioritizing candidate genes at disease-associated loci. QTL mapping has been focused on multi-tissue expression QTLs or plasma protein QTLs (pQTLs). We generated a cerebrospinal fluid (CSF) pQTL atlas by measuring 6,361 proteins in 3,506 samples. We identified 3,885 associations for 1,883 proteins, including 2,885 new pQTLs, demonstrating unique genetic regulation in CSF. We identified CSF-enriched pleiotropic regions on chromosome (chr)3q28 near OSTN and chr19q13.32 near APOE that were enriched for neuron specificity and neurological development. We integrated our associations with Alzheimer’s disease (AD) through proteome-wide association study (PWAS), colocalization and Mendelian randomization and identified 38 putative causal proteins, 15 of which have drugs available. Finally, we developed a proteomics-based AD prediction model that outperforms genetics-based models. These findings will be instrumental to further understand the biology and identify causal and druggable proteins for brain and neurological traits.
Article
Background A serum-free, highly purified Vero rabies vaccine-next generation (PVRV-NG2) is under development. We conducted a Phase III trial to describe the safety and immunogenicity profile of PVRV-NG2 compared with those of licensed purified Vero rabies vaccine (PVRV) in a simulated rabies post-exposure prophylaxis (PEP) Zagreb regimen in Thailand. Methods Healthy adults aged ≥18 years (n=201) were randomized in a 2:1 ratio to receive PVRV-NG2 or PVRV in a rabies PEP Zagreb (Day 0, 7, 21 [2-1-1]) regimen, with concomitant human rabies immunoglobulin (HRIG) at Day 0. Immunogenicity endpoints included the proportion of participants with rabies virus neutralizing antibody (RVNA) titers ≥0.5 IU/mL at Days 0, 14, and 35. Safety outcomes were also assessed. Results A total of 199 participants completed the study (PVRV-NG2 n=133, PVRV n=66). In the PVRV-NG2 group and PVRV group, respectively, 91.0% (95% CI: 84.1, 95.6) and 94.6% (95% CI: 85.1, 98.9) had RVNA titers ≥0.5 IU/mL at Day 14, increasing to 100% (95% CI: 96.8, 100) and 100% (95% CI: 93.5, 100) by Day 35. Both vaccines had a similar safety profile, and there were no safety concerns. Conclusions PVRV-NG2 showed acceptable safety and immunogenicity profiles when co-administered with HRIG in a simulated PEP Zagreb regimen in healthy adults in Thailand. Trial registration ClinicalTrials.gov Identifier: NCT04594551; WHO: U1111-1238-1726
Article
Full-text available
Background Obesity drives metabolic disease development. Preventing weight gain during early adulthood could mitigate later-life chronic disease risk. Increased dietary fibre intake, leading to enhanced colonic microbial fermentation and short-chain fatty acid (SCFA) production, is associated with lower body weight. Despite national food policy recommendations to consume 30 g of dietary fibre daily, only 9% of adults achieve this target. Inulin-propionate ester (IPE) selectively increases the production of the SCFA propionate in the colon. In previous studies, IPE has prevented weight gain in middle-aged adults over 6 months, compared with the inulin control. IPE is a novel food ingredient that can be added to various commonly consumed foods with a potential health benefit. This 12-month study aimed to determine whether using IPE to increase colonic propionate prevents further weight gain in overweight younger adults. Methods This multi-centre randomised-controlled, double-blind trial was conducted in London and Glasgow, UK. Recruited participants were individuals at risk of weight gain, aged between 20 and 40 years and had an overweight body mass index. Sealed Envelope Software was used to randomise participants to consume 10 g of IPE or inulin (control), once per day for 12 months. The primary outcome was the weight gained from baseline to 12 months, analysed by an ‘Intention to Treat’ strategy. The safety profile and tolerability of IPE were monitored through adverse events and compliance. This study is registered with the International Standard Randomised Controlled Trials (ISRCT) Database (ISRCT number: 16299902). Findings Participants (n = 135 per study arm) were recruited from July 2019 to October 2021. At 12 months, there was no significant difference in baseline-adjusted mean weight gain for IPE compared with control (1.02 kg, 95% CI: −0.37 to 2.41; p = 0.15; n = 226). Neither the IPE (+1.22 kg) nor the control arm (+0.07 kg) unadjusted mean gains in body weight reached the expected 2 kg threshold. In the IPE arm, fat-free mass was greater by 1.07 kg (95% CI: 0.21–1.93), and blood glucose elevated by 0.11 mmol/L (95% CI: 0.01–0.21). Compliance, determined by intake of ≥50% sachets, was reached by 63% of IPE participants. There were no unexpected adverse events or safety concerns. Interpretation Our study indicates that at 12 months, IPE did not differentially affect weight gain, compared with the inulin control, in adults between 20 and 40 years of age, at risk of obesity. Funding NIHR EME Programme (15/185/16).
Article
Foot-and-mouth disease (FMD) is a highly contagious disease of cloven-hoofed animals. FMD poses an economic threat to the livestock industry in the United States. Due to the potential use of vaccines composed of partially purified structural proteins of the FMD virus (FMDV), it is important to test samples from infected and vaccinated animals with a competitive ELISA that detects antibodies against nonstructural proteins (NSPs) of FMDV. Our study extends the diagnostic validation of the Prionics ELISA (Thermo Fisher) and the VMRD ELISA. We used diverse serum sample sets from bovine, porcine, and other cloven-hoofed animals to evaluate the analytical specificity and sensitivity, diagnostic specificity and sensitivity, and differentiation of infected from vaccinated animals (DIVA) per validation guidelines outlined by the World Organisation for Animal Health (WOAH). The 2 tests were analytically 100% accurate. The VMRD test was diagnostically more sensitive than Prionics, but Prionics was diagnostically more specific than the VMRD test. Both tests could tell if animals were infected or vaccinated. Considering these data, both VMRD and Prionics ELISAs can be used for serodetection of FMDV antibodies at the Foreign Animal Disease Diagnostic Laboratory and within the National Animal Health Laboratory Network laboratories.
Article
Full-text available
Introduction Our objective was to assess non‐inferiority of the unique approach used in our institution of combined 10 IU IM (intramyometrial) and 10 IU IV (intravenous) oxytocin to carbetocin IV in preventing severe postpartum blood loss in elective cesarean sections. The design was a prospective controlled phase IV non‐inferiority interventional trial. The setting was a tertiary center at University Hospital, Zurich, Switzerland. Material and Methods The population consisted of 550 women undergoing elective cesarean section after 36 completed weeks of gestation at low risk for postpartum hemorrhage (PPH). Participants were assigned to either combined oxytocin regimen (10 IU IM and 10 IU IV) or carbetocin (100 μg IV). Non‐inferiority for oxytocin for severe PPH was assessed with a 0.05 margin using the Newcombe–Wilson score method. The main outcome measures were severe postpartum blood loss defined as delta hemoglobin (∆Hb, Hb prepartum—Hb postpartum) ≥30 g/L. Results Non‐inferiority of combined oxytocin (IM/IV) in preventing severe postpartum blood loss was not shown (17 women in the oxytocin group vs. 7 in the carbetocin group). The number needed to treat when using carbetocin was 28. The risk difference for ∆Hb ≥30 g/L was 0.04 (oxytocin 0.06 vs. 0.03), 95% confidence interval (CI) (0.00–0.08). No significant difference was observed for ∆Hb (median 12 [IQR 7.0–19.0] vs. 11 [5.0–17.0], p = 0.07), estimated blood loss (median 500 [IQR 400–600] vs. 500 [400–575], p = 0.38), or the PPH rate defined as estimated blood loss ≥1000 mL (12[4.5] vs. 5 [2.0], risk difference 0.03, 95% CI (−0.01 to 0.06), p = 0.16). More additional uterotonics were administered in the oxytocin group compared to the carbetocin group (15.2% vs. 5.9%, p = 0.001). Total case costs were non‐significantly different in the oxytocin group (US $ 10 146 vs. 9621, mean difference 471.4, CI (−476.5 to 1419.3), p = 0.33). Conclusions Combined (IM/IV) oxytocin is not non‐inferior to carbetocin regarding severe postpartum blood loss defined as postpartum Hb decrease ≥30 g/L in elective cesarean sections. We recommend carbetocin for use in clinical practice for elective cesarean sections.
Article
We consider inference for a scalar parameter Ψ in the presence of one or more nuisance parameters. The nuisance parameters are required to be orthogonal to the parameter of interest, and the construction and interpretation of orthogonalized parameters is discussed in some detail. For purposes of inference we propose a likelihood ratio statistic constructed from the conditional distribution of the observations, given maximum likelihood estimates for the nuisance parameters. We consider to what extent this is preferable to the profile likelihood ratio statistic in which the likelihood function is maximized over the nuisance parameters. There are close connections to the modified profile likelihood of Barndorff‐Nielsen (1983). The normal transformation model of Box and Cox (1964) is discussed as an illustration.
Article
For X with Binomial (n, p) distribution, Section 1 gives a one-page table of .95 and .99 confidence intervals for p, for n = 1, 2, …, 30. This interval is equivariant under X → n − X and p → 1 − p, has approximately equal probability tails, is approximately unbiased, has Crow’s property of minimizing the sum of the n + 1 possible lengths, and each of its ends is increasing in X and decreasing in n with about as regular steps as possible. Sections 2 and 3 consider the usual approximate confidence intervals. Calculations and asymptotic results show the need for the continuity correction in these even when n is large. Because of the nonuniformity of the Binomial → Normal convergence as p → 0, these intervals fail to have their stated asymptotic confidence coefficients; simple corrections are given for this. In the approximate interval (X/n) ± {(c/√n)√[(X/n)(1 − X/n)] + 1/(2n)} it is shown that the factor (c/√n) should be replaced by c/√ (n − c2 − 2c/√n − 1/n).