Content uploaded by Limin X Clegg
Author content
All content in this area was uploaded by Limin X Clegg
Content may be subject to copyright.
Statistical Methods in Medical Research 2006; 15: 547–569
Efficient interval estimation for age-adjusted
cancer rates
Ram C Tiwari National Cancer Institute, NIH, Bethesda, MD, USA, Limin X Clegg Office
of Healthcare Inspections, OIG, Department of Veterans Affairs, Washington, DC, USA and
Zhaohui Zou Information Management Services, Inc., Silver Spring, MD, USA
The age-adjusted cancer rates are defined as the weighted average of the age-specific cancer rates, where the
weights are positive, known, and normalized so that their sum is 1. Fay and Feuer developed a confidence
interval for a single age-adjusted rate based on the gamma approximation. Fay used the gamma appro-
ximations to construct an Finterval for the ratio of two age-adjusted rates. Modifications of the gamma
and Fintervals are proposed and a simulation study is carried out to show that these modified gamma
and modified Fintervals are more efficient than the gamma and Fintervals, respectively, in the sense that
the proposed intervals have empirical coverage probabilities less than or equal to their counterparts, and
that they also retain the nominal level. The normal and beta confidence intervals for a single age-adjusted
rate are also provided, but they are shown to be slightly liberal. Finally, for comparing two correlated
age-adjusted rates, the confidence intervals for the difference and for the ratio of the two age-adjusted rates
are derived incorporating the correlation between the two rates. The proposed gamma and Fintervals
and the normal intervals for the correlated age-adjusted rates are recommended to be implemented in the
Surveillance, Epidemiology and End Results Program of the National Cancer Institute.
1 Introduction
Despite rapid advances in medicine, cancer continues to be a major public health concern
in the US and around the world. The total number of deaths due to cancer continues
to rise, even though the age-adjusted mortality rates for many common cancer sites
continue to decline.1Many public and private agencies dealing with cancer and related
issues depend on these statistics for planning and resource allocation. Such figures have
important social and economic ramifications, from deciding which programs get funded,
to deciding how funds are allocated among various programs. Having reliable and accu-
rate confidence intervals (CIs) for the means of the age-adjusted cancer mortality and
incidence rates for recent years is very important for everyone concerned. The higher the
coverage probabilities of the CIs, the more conservative the CIs are. Therefore, a desir-
able property of these CIs is that while retaining the nominal level, they have coverage
probabilities as close to the nominal level as possible.
In the US, the data on cancer mortality are obtained from death certificates. Due to
administrative and procedural delays, these data become fully available to the public
Address for correspondence: Limin Clegg, Office of Healthcare Inspections, VA OIG 801 1 Street, NW,
Room 1013, Washington, DC 20001, USA. E-mail: lin_clegg@va.gov
© 2006 SAGE Publications 10.1177/0962280206070621
548 RC Tiwari, LX Clegg and Z Zou
from the National Center for Health Statistics (NCHS) after approximately three years.
The cancer incidence and mortality data are also available from the Surveillance,
Epidemiology and End Results (SEER) Program of the National Cancer Institute
(NCI). The SEER Program is an authoritative source for the cancer incidence and sur-
vival data in the US Population data are available from the US Census Bureau. The
American Cancer Society (ACS) publishes reports on cancer trends in their widely cir-
culated annual publication,2Cancer Facts & Figures, which is also available online:
http://www.cancer.org/.
The state-level age-adjusted cancer (incidence or mortality) rates are given by
ri=
J
j=1
wjdij
nij ,i=1, ...,I
where dij and nij are the number of cancer (incidence or mortality) counts and the count
of mid-year population for the age-group jand the state i, respectively, and the wjare
the normalized proportion of mid-year population for the age-group jin the standard
population, so that J
j=1wj=1. In the SEER Program, for each of the 51 regions (50
states and Washington D.C.) in the US, there are 19 standard age-groups consisting of
0−<1, 1−4, 5−9, ...,85+. The US-level age-adjusted cancer (incidence or mortality)
rates are given by
r=
J
j=1
wjdj
nj
with dj=I
i=1dij and nj=I
i=1nij. The SEER Program contains age-adjusted mor-
tality rates, based on the 2000 US standard population, for the US and for each of the
51 regions by cancer sites. The age-adjusted mortality rates for a selected number of
cancer sites and a number of countries in the world are also reported in the Cancer
Facts & Figures publication.2These age-adjusted rates are based on the World Health
Organization’s world standard population. Thus, the results of this paper, even though
discussed in the context of the age-adjusted mortality rates for the US, apply to similar
data sets for other countries.
For each i(i=1, ...,I), let d(−i)j=dj−dij and n(−i)j=nj−nij and define
r(−i)=
J
j=1
wjd(−i)j
n(−i)j
to be the age-adjusted rate for the rest of the US after deleting the region i.
Let Dij,Dj,D(−i)j,Ri,R(−i)and Rdenote the random variables whose realiza-
tions are dij,dj,d(−i)j,ri,r(−i)and r, respectively. We assume that Dij are independent
Poisson random variables3with parameters λij, that is, Dij ∼ind Po(nijλij ). Note that by
Efficient interval estimation for age-adjusted cancer rates 549
the moment generating function, D(−i)j∼Po(I
i=inijλij)and Dj∼Po(I
i=1nijλij ).
Let ξij =nij/nj,ξ(−i)j=I
i=iξijand ξj=nj/n, where n=J
j=1nj=I
i=1J
j=1nij.
Let µi,µ(−i),µ,vi=σ2
i/n,ν(−i)=σ2
(−i)/nand v≡σ2/nbe the means and variances
of Ri,R(−i)and R, respectively, and let and ρi/nbe the Cov(Ri,R), where their explicit
expressions are derived in Appendix A. Let wij =wj/nij and define the estimates of µi,
µ(−i),µ,σ2
i,σ2
(−i),σ2and ρias
ˆµi=ri;ˆµ(−i)=r(−i);ˆµ=r
ˆσ2
i=n
J
j=1
w2
ijdij ;ˆσ2
(−i)=n
J
j=1
w2
j
d(−j)
n2
(−j)
ˆσ2=n
J
j=1
w2
j
dj
n2
j
;ˆρi=n
J
j=1
wij wj
njdij
For a rare cancer site, as the observed total counts diare very small with dij =0 plausibly
for several j, the value of riis either close to 0 or equal to 0. As we will see subse-
quently, when ri=0, the gamma intervals of Fay and Feuer4is not defined. To avoid
such situations, we introduce a correction factor, which amounts to distributing a count
of 1 uniformly to all Jcategories, and hence adding 1/J, the expected value under multi-
nomial distribution with parameters 1 and cell probabilities 1/J,todij,j=1, ...,J,in
calculation of the estimates of µi,σ2
iand ρi. We redefine rias
˜ri=
J
j=1
wij dij +1
J=ri+¯wi
where ¯wi=1/JJ
j=1wij and modify the estimates of µi,σ2
i,σ2
(−i)and ρiaccordingly
by replacing dij by (dij +1/J). Thus,
˜µi=˜ri;˜σ2
i=n
J
j=1
w2
ij dij +1
J;˜ρi=n
J
j=1
wij wj
njdij +1
J
Note that ˜ri≈rifor common cancer sites as ¯wi≈0. Let
ˆνi=ˆσ2
i
n;˜νi=˜σ2
i
n;ˆν=ˆσ2
n
The objectives of this paper include the construction of CIs for parameters such as
i) the mean µiof the age-adjusted rate for the region i; ii) the mean µof the age-adjusted
550 RC Tiwari, LX Clegg and Z Zou
rate for the US; iii) the ratio of the mean age-adjusted rates µi/µifor region ito region
i; iv) the ratio of the mean age-adjusted rates µi/µ(−i)for region ito the rest of the US;
v) the ratio of the mean age-adjusted rates µi/µ for region ito the US; and vi) the
difference of the mean age-adjusted rates µi−µ, between region iand the US. Fay and
Feuer4derived a CI for µi(or µ) assuming that a mixture of Poisson distributions can be
approximated by a gamma distribution and compared the performance of the gamma
intervals with the approximate bootstrap confidence (ABC) intervals5−7and the ‘chi-
squared’ intervals of Dobson et al.8through simulations. They observed that the gamma
intervals retained at least the nominal coverage and were more conservative than the ABC
intervals and chi-squared intervals. We propose a modification of the gamma interval
for µi(or µ) developed by Fay and Feuer4and derive new CIs for µi(and µ) based on
the beta and normal approximations of Ri(and R).
Fay9used the gamma approximation of Fay and Feuer4and developed a CI, based on
an approximate Fdistribution, for the ratio of two age-adjusted rates that can be applied
to µi/µiand µi/µ(−i), but not to µi/µ as the age-adjusted rate for the US involves the
counts from the region i. We also propose a modification of the Finterval of Fay9. We use
the normal approximations of Ri/Ri,Ri/R(−i),Ri/Rand Ri−R, taking into account
the correlation between Riand R, and construct the CIs for µi/µi,µi/µ(−i),µi/µ and
µi−µ. It is important to mention that for comparing the state and US level age-adjusted
rates, the current procedure10 is to use the normal CI for µi−µbased on ρi=0. For
simulations, we use the observed age-adjusted mortality rates for the 51 regions and the
US for year 2002 from the SEER Program for a rare cancer site, the tongue cancer.
The rest of the paper is organized as follows. In Section 3, we briefly review the works
of Fay and Feuer4and Fay9and present the modified gamma and Fintervals. We also
derive the CIs for the ratio of the means of the two age-adjusted rates namely the age-
adjusted rates of any two regions, any region to the rest of the country, any region to
the entire country and for the difference of the means of the age-adjusted rates between
a region and the country. Simulations are carried out in Section 4, and we discuss our
findings in Section 5. The conclusions are presented in Section 6.
2 Confidence intervals for age-adjusted rates
2.1 Gamma and Fapproximations
Note that if X∼Po(θ ), then for an integer x≥0,
P(X≥x|θ) =θ
0fZ(z|x,1)dz
where Z∼G(x,1)=d1/2χ2
2xand, in general, G(α,β) =dβ/2χ2
2α(allowing non-
integer degrees of freedom) has density
fZ(x|α,β) =1
βα(α) exp −x
βxα−1,x>0
Efficient interval estimation for age-adjusted cancer rates 551
with mean E(Z)=αβ and variance Var(Z)=αβ 2. Let xbe the observed value of X
and let (L(x;α),U(x,α)) denote the 100(1−α)%CIforθ, where L(x;α) is obtained
by solving the equation
P(X≥x|θ=L(x;α)) =α
2
and U(x,α) is obtained by solving
P(X≤x|θ=U(x;α)) =α
2
or equivalently by solving
P(X>x|θ=U(x;α)) =P(X≥x+1|θ=U(x;α)) =1−α
2
Thus, L(x;α) =1/2(χ2
2x)−1(α/2)and U(x;α) =1/2(χ 2
2(x+1))−1(1−α/2). Fay and
Feuer4called the interval (L(x;α),U(x;α)) ‘exact’ while others, for example, Johnson
and Kotz,11 use the term ‘approximate’ interval.
Let wi(1)≤···≤wi(J)be the ordered values of wij,j=1, ...,J. Fay and Feuer4
assumed that a mixture of Poisson distributions is approximately distributed as a gamma
distribution; that is,
P
J
j=1
wijDij ≥y|µi,νi
≈µi
0fZiz
y2
νi,νi
ydz
where Zi∼G(y2/νi,νi/y). This assumption essentially means that the distribution of
a linear combination of independent Poisson random variables is approximately dis-
tributed as a gamma random variable with the mean and variance of the gamma
distribution equal to the mean and variance of the linear combination, respectively.
Fay and Feuer4used this approximation to construct approximate 100(1−α)% CIs for
the true age-adjusted rates µi.
The lower confidence limit L(ri;α) was obtained by solving the equation
α
2=P
J
j=1
wijDij ≥ri|µi,νi
≈L(µi;νi)
0fZiz
r2
i
νi,νi
ridz
This yields
L(ri;ˆνi;α) =G−1α
2;r2
i
ˆνi,ˆνi
ri=ˆνi
2riχ2
2r2
i/ˆνi−1α
2
where G−1is the inverse function of the gamma distribution function and (χ2
l)−1(α)
denotes the 100αth percentile of the chi-squared distribution with ldegrees of freedom.
552 RC Tiwari, LX Clegg and Z Zou
Note that when ri=0, L(ri;ˆνi;α) is not defined. For the upper confidence limit U(ri;α),
Fay and Feuer4solved the equation
1−α
2=P
J
j=1
wijDij >ri|µi,νi
≥P
J
j=1
wijDij ≥ri+wi(J)|µi,νi
≈U(µi;νi)
0fZiz
(ri+wi(J))2
νi+w2
i(J)
,νi+w2
i(J)
ri+wi(J)dz
resulting in
U(ri;ˆνi,wi(J);α) =G−11−α
2;r2
i
ˆνi,ˆνi
ri,wi(J)
=ˆνi+w2
i(J)
2(ri+wi(J))χ2
(2(ri+wi(J))2/ˆνi+w2
i(J))−11−α
2
Fay and Feuer4performed simulations to study the performance of their gamma CIs
(L(ri;ˆνi;α),U(ri;ˆνi;wi(J);α)) and found that the upper confidence limits were more
conservative than those based on the ABC intervals5−7and the chi-squared intervals of
Dobson et al.,8henceforth referred to as DKES intervals. For completeness the ABC and
DKES intervals are given in Appendix B.
Fay and Feuer4have mentioned that when the weights wij for all jare equal to a con-
stant, ci>0, say, the CI for µi=E(J
j=1wijDij )=ciE(Di)is (ciL(di;α),ciU(di:α))
exact with Di∼Po(J
j=1nijλij ). However, note that since wij =wj/nij depend on both
the standards wjand on the mid-year populations nij, the condition that wij are equal
to a constant for all jis not easily satisfied. For example, a sufficient condition for this
condition to hold is that wjare all equal and nij are all equal. Another sufficient con-
dition for wij to be equal to a constant for all jis to assume nij is proportional to wj,
independent of i, for all j. If a populous state like California or New York has the age-
group distribution of its population similar to that of the entire US, then for that state,
one may expect nij to be proportional to wjand hence the CI for µito be exact.
Since wi(l)≤wi(l+1),wehave
P
J
j=1
wijDij >ri|µi,νi
≥P
J
j=1
wijDij ≥ri+wi(l)|µi,νi
≥P
J
j=1
wijDij ≥ri+wi(l+1)|µi,νi
,l=1, ...,J−1
Efficient interval estimation for age-adjusted cancer rates 553
Thus proceeding as above, one can construct the upper confidence limits
U(ri;ˆνi;wi(1);α),U(ri;ˆνi;wi(2);α),...,U(ri;ˆνi,wi(J);α) varying from being the most
liberal upper limit to the most conservative upper limit. In fact, there are an infinite
number of choices for such an upper confidence limit.
As a compromise, we propose an upper limit that is based on the mean ¯wi=
1/JJ
j=1wij and that depends on all wi(l),l=1, ...,J. As mentioned earlier, this
assumes distributing a count of 1 uniformly to all Jage-groups. Thus,
1−α
2=P
J
j=1
wijDij >ri|µi,νi
≥P
J
j=1
wijDij ≥ri+¯wi|µi,νi
=P
J
j=1
wijDij ≥˜ri|µi,νi
Now,assuming that (dij +1/J)havemeans equal to their variances,similar to the Poisson
distribution, so that
Var
J
j=1
wij dij +1
J
=
J
j=1
w2
ij dij +1
J
using the gamma approximation, the upper confidence limit for µiis given by
U(˜ri;˜νi;¯wi;α) =˜νi
2˜ri(χ 2
(2˜r2
i/˜νi))−11−α
2
Therefore, the proposed gamma CI for µiis (L(ri;ˆνi;α),U(˜ri;˜νi;¯wi;α)). Another
approximation of the upper confidence limit based on the mean ¯wican be obtained
by using (ˆνi+¯w2
i)instead of ˜νi. This results in the following CI: (L(ri;ˆνi;α),U(˜ri;ˆνi+
¯w2
i;¯wi;α)). Through simulations (not shown here), we found that these two intervals
performed very similarly. Therefore, we will focus only on (L(ri;ˆνi;α),U(˜ri;˜νi;¯wi;α)).
Note that the lower limits of the gamma interval of Fay and Feuer4and the modi-
fied gamma intervals are the same. We shall define the CI for µiwhen ri=0as(0,
U(˜ri;˜νi;¯wi;α)), thus ensuring a coverage probability of at least (1−α).
Fay9developed a confidence interval for the ratio of two age-adjusted rates, µi/µi,for
µi,µi>0, based on Ri=J
j=1wijDij and Ri=J
j=1wijDij, where Dij and Dijare
independent. Assuming the gamma approximations for Riand Ri, Fay9used the result
that, conditional on Dij +Dij=tj, the distribution of Dij is a binomial distribution with
parameters tjand nij(λij /λij)/nij(λij/λij)+nij. For constructing the lower confidence
limit for µi/µi, Fay9assumed that µiis distributed as gamma with mean riand variance
554 RC Tiwari, LX Clegg and Z Zou
ˆνiand that µiis distributed as gamma with mean (ri+Wi) and variance (ˆνi+W2
i)
and used the result that, conditional on tj,
ri+Wi
riµi
µi∼F((2r2
i/ˆνi),(2(ri+Wi)2/( ˆνi+W2
i)))
where Wi=maxj:dij<tj{wij}and for independent χ2
m=dG(m/2, 2)and χ2
n=d
G(n/2, 2),F(m,n)=d(χ2
m/m)/(χ2
n/n)denotes the Fdistribution with numerator degrees
of freedom (d.f.) mand the denominator d.f. nwith density given by
g(x|m,n)=((m+n)/2)
(m/2)(n/2)m
nm/2x(m/2)−1
(1+(m/n)x)(m+n)/2,0<x<∞
Since the numerator and the denominator chi-squared random variables in
F((2r2
i)/ˆνi),(2(ri+Wi)2/( ˆνi+W2
i)) depend on tj, the unconditional distribution of µi/µiis
a mixture of Fdistributions, and not an Fdistribution.
The lower confidence limit is
ri
ri+WiF−1
((2r2
i)/ˆνi,(2(ri+Wi)2)/( ˆνi+W2
i)) α
2
where F−1
(a,b)(p)is the pth percentile of F(a,b). Now, assuming that µiis distributed as
gamma with mean (ri+Wi) and variance (ˆνi+W2
i) and that µiis distributed as gamma
with mean riand variance ˆνi, Fay9derived the upper confidence limit to be
ri+Wi
riF−1
((2(ri+Wi)2)/(ˆνi+W2
i),(2r2
i)/ˆν
i)) 1−α
2
Note that this approximation cannot be readily applied for constructing CIs for the ratios
µi/µ, that is, the ratios of the age-adjusted rates for the regions ito the US age-adjusted
rates, as the latter depends on the former ones.
We propose a modification in the above CI for µi/µi. For the lower limit, we assume
that µiis distributed as gamma with mean riand variance ˆνiand that µiis distributed
as gamma with mean ˜riand variance ˜νiand since the two distributions are independent
chi-squares, we have ˜ri
riµi
µi∼F((2r2
i)/ˆνi,(2˜r2
i)/˜νi)
This results in the lower limit to be ri/˜riF−1
((2r2
i)/ˆνi,(2˜r2
i)/˜νi)(α/2). Similarly, the upper limit
can be obtained. The proposed CI for µi/µiis
ri
˜riF−1
((2r2
i)/ˆνi,(2˜r2
i)/˜νi)α
2,˜ri
riF−1
((2˜r2
i)/˜νi,(2r2
i)/ˆνi)1−α
2
Efficient interval estimation for age-adjusted cancer rates 555
Another CI for µi/µiusing (ri+¯wi)and (ˆνi+¯w2
i)instead of riand ˜νiis given by
ri
ri+¯wiF−1
((2r2
i)/ˆνi,(2(ri+ˆwi)2)/(ˆνi+¯w2
i))α
2,ri+¯wi
riF−1
((2(ri+¯wi)2)/(ˆνi+¯w2
i),(2r2
i)/ˆν
i)1−α
2
Once again, we mention that this interval performs similarly to the above one, and we
will not focus on this. We further remark that, unlike as in Fay,9these intervals do not
assume the dependence of the wij on tj.
2.2 Normal approximations
Define Rij =Dij/nij (=Dij /(nξijξj)). Let n→∞so that 0 <ξ
ij,ξj<1. Note that
0<λ
ij <∞. Then as min{nijλij}→∞,
√nξijξj
λij 1/2
(Rij −λij)−→ind N(0, 1),i=1, ...,I;j=1, ...,J
That is, Rij are independent and asymptotically normally distributed, Rij ∼
AN(λij,λij/(nξij ξj)). The other asymptotic results based on Rij, 100(1−α)% CIs for
µi,µ,µi/µi,µi/µ,µi/µ(−i)and µi−µ, and their logarithmic and logit transforma-
tions are presented in Appendix C. In particular, the 100(1−α)% CIs for µi/µ, and
µi−µ, based on the correlated age-adjusted rates, are given by
µi
µ=
ˆµi
ˆµ±zα/2(ˆσ2
iˆµ2+ˆσ2ˆµ2
i−2ˆρiˆµiˆµ)
nˆµ4
∨0
µi−µ=ˆµi−ˆµ±zα/2ˆσ2
i+ˆσ2−2ˆρi
√n
where a∨b=max(a,b). When ρi=0, which is true iff λij =0 for all j, these CIs reduce
to (see, e.g., Ries et al.10 for the CI of µi−µwhen ρi=0)
µi
µ=ˆµi
ˆµ±zα/21
√n1
ˆµ4(ˆσ2
iˆµ2+ˆσ2ˆµ2
i)∨0, µi−µ=ˆµi−ˆµ±zα/2ˆσ2
i+ˆσ2
√n
Since ρi>0, the length of the CI for µi−µ, ignoring the adjustment for ρi, is wider,
and hence the interval is more conservative.
2.3 Beta approximations
In general, the age-adjusted rates are less than 1 and equal to 1 if and only if there is
one age-group with both the values of cancer counts and population at risk for that age
556 RC Tiwari, LX Clegg and Z Zou
group equal to 1, which is not a practical case. A rationale for the beta approximation
is as follows. Let Ri=J
j=1wjRij, where Rij =Dij/nij. Let Dij and ¯
Dij be independent
Poisson random variables with means nijλij and nij (1−λij), respectively. Then the dis-
tribution of Dij|Dij +¯
Dij=nij ,λij ∼Bin(nij,λij), a binomial distribution with parameters nij
and λij.12 Using the result, given in Appendix D, we can approximate the distribution of
Riby a beta distribution with parameters ˆaiand ˆ
bi,Be(ˆai,ˆ
bi), where
ˆai=˜ri˜ri(1−˜ri)
˜νi−1,ˆ
bi=(1−˜ri)˜ri(1−˜ri)
˜νi−1
We define an approximate 100(1−α)%CIforµias (L¯
Ri,U¯
Ri), where L¯
Rand U¯
Rare
obtained by solving the following incomplete beta integrals:
L¯
Ri
0B(x|ˆai,ˆ
bi)dx=α
2,U¯
Ri
0B(x|ˆai,ˆ
bi)dx=1−α
2
Here, B(x|a,b)is the density of a beta distribution, Be(a,b), with parameters aand b.
3 Examples and simulations
As an illustration, age-adjusted tongue cancer mortality rates were calculated for each
of the regions. Tongue cancer occurs mostly among the elders. The 2002 mortality
data for tongue cancer, even though available from the NCHS, were obtained from
the SEER Program of the NCI (see the web site: www.seer.cancer.gov). We carried out
two different simulation studies to evaluate the performance of the proposed gamma,
beta and normal (with lower limits truncated at 0) intervals with the gamma interval
of Fay and Feuer.4In the first simulation study, we took the true means of the Poisson
distributions of Dij to be the observed values of deaths in the (i,j)th cell, where istands
for the 51 regions of the US (50 states and Washington DC) and jstands for the 19
age-groups, to be (i=1, ..., 51; j=1, ...,19). Therefore, the true value of µiis the
observed value of the age-adjusted rate for each i. From the Poisson distributions, we
generated 10 000 values of dij, and obtained the observed values of the age-adjusted
rates Riusing the normalized weights wj, based on the 2000 US standard population,
so that J
j=1wj=1. We computed approximate 95% CIs for µifor each of the 51
regions using the gamma intervals of Fay and Feuer4and the proposed gamma, beta and
normal intervals. Additionally, we compared the Finterval of Fay9for µi/µ(−i)with the
proposed Fand normal intervals (with left limits truncated at 0). We compared the age-
adjusted rate of each of the 51 regions with the rest of US age-adjusted rate. Once again,
we chose the year 2002 tongue cancer mortality age-adjusted rates for the 51 regions.
The simulations were carried out assuming the 2000 standard population generating dij
from independent Poisson with mean equal to the observed dij.
Efficient interval estimation for age-adjusted cancer rates 557
Table 1 gives the results of the first simulation study. Columns 2 and 3 of the table
give the observed (true) tongue cancer mortality counts and age-adjusted rates (per
100 000 mid-year population) for the 50 states, the District of Columbia, and the four
Census Bureau Regions (Northwest, Midwest, West, and South). Column 3 presents the
empirical coverage probabilities of the 95% CIs for the (simulated) age-adjusted rates
based on the gamma, modified gamma, beta, and normal approximations. Column 5
shows the observed (true) ratios of age-adjusted rates of each of the 51 regions with the
rest of the US Column 6 gives the empirical coverage probabilities of the 95% CIs for
the (simulated) rate ratios based on Fmodified Fand normal approximations.
Both the modified gamma and modified Fintervals are more efficient than their coun-
terparts because their empirical coverage probabilities are at least 95%, but are lower
than for the gamma and the Fintervals. The beta and normal intervals are slightly liberal
as they do not perform well as they have empirical coverage probabilities less than 95%
for a number of regions.
In the second simulation study, we considered the effect of randomly generated values
of wij and dij on the performance of the gamma, beta and normal intervals. Here, the
subscript idoes not play any role, and is treated as a dummy variable, but it is kept for the
sake of notational consistency. We generated 19 numbers, corresponding to the J=19
age-groups, from the uniform U(0, 1)distribution and standardized them (and called
them wij;j=1, ...,19)so that 19
j=1wij is a very small number, say, equal to 5.0 ×10−6.
We again generated 19 numbers from U(0, 1)and standardized them (and called them
dij;j=1, ..., 19) so that their sum is small, 19
j=1dij =20 . These standardized numbers
were taken to be the values of the true means λij,j=1, ..., 19.
Then, we simulated 10 000 values of dij from the Poisson distributions with means
λij,j=1, ..., 19. From these, we calculated the age-adjusted rates riand the 95% CIs
for µiusing the gamma and normal intervals. We also calculated the variance of wij.
We repeated the entire process 500 times. Note that we could have standardized the sum
19
j=1wij to any other small number, but we chose it to be 5.0 ×10−6so that it was
similar to what we have based on the 2000 US standard population and the 2002 age-
adjusted rates. We also could have standardized the sum 19
j=1dij to any other number
than 20 possibly to 50, but we kept it to 20 to see the effect of small sample size; that is,
the small number of total mortality counts for the region, i.
Note that out of 10 000 intervals, corresponding to each one of the 500 replications, it
is expected that approximately 9500 intervals would contain the true mean µiand 500
would not; that is, it is expected that approximately 250 values of the lower limits would
be above the true mean µiand about the same number of the upper limits would be below
the true mean µi. In Figures 1 and 2, we plotted the 500 values of the variance of the
normalized weights wij on the x-axis, and the frequencies of the lower and upper limits of
µifor the Fay and Feuer4intervals, modified gamma, beta and normal intervals that fell,
respectively, above and below the true mean µi, were plotted on the y-axis. In Figure 3,
we plotted both the lower and the upper limits against the variance of wij. Note that
the two solid lines in Figure 3 correspond to the lower and upper 95% confidence limits
for true proportion, p, based on Bin(10 000, 0.05), and then rescaled by multiplying by
10 000; that is 10000(0.05 ±1.96√0.05 ×0.95/10000)≈(457, 543). Thus the expected
558 RC Tiwari, LX Clegg and Z Zou
Table 1 Comparisons of empirical coverage probabilities for 95% CIs for the age-adjusted mortality rates
of states/Census Bureau Regions and ratios of these rates to the rest of the US for tongue cancer
State/region True True rate Coverage of 95% CI (rate) Coverage of 95% CI (ratio)
count (per 100 000) Modified True Modified
Gamma gamma Beta Normal ratio FF Normal
Alaska 1 0.25 97.0 97.0 97.0 99.3 0.38 97.0 97.0 99.3
Wyoming 3 0.56 98.8 98.8 96.5 98.9 0.87 98.8 98.8 99.0
Montana 4 0.41 98.0 98.0 95.3 99.2 0.63 98.1 98.1 99.1
Vermont 4 0.58 98.1 98.1 95.0 99.3 0.89 98.3 98.3 99.2
Delaware 5 0.60 98.8 98.8 96.9 96.1 0.92 98.7 98.7 96.1
Rhode Island 5 0.45 98.6 98.6 96.6 95.2 0.69 98.4 98.4 96.1
Washington DC 6 1.06 97.9 96.4 94.1 97.2 1.62 97.9 96.8 97.0
Utah 6 0.34 97.3 96.7 94.8 96.8 0.52 97.7 96.8 96.6
Nebraska 8 0.46 96.8 96.8 95.1 94.9 0.70 96.8 96.7 95.2
South Dakota 8 0.92 97.7 97.1 95.1 96.2 1.42 97.7 97.1 96.2
New Mexico 9 0.48 96.8 96.2 94.4 96.5 0.73 97.0 96.4 96.2
West Virginia 9 0.41 97.5 97.5 95.7 96.4 0.62 97.8 97.6 96.5
North Dakota 10 1.51 97.6 97.6 96.1 95.7 2.32 97.2 97.0 95.7
Hawaii 12 0.89 96.7 96.3 94.9 95.8 1.37 96.8 96.5 95.9
Iowa 12 0.36 96.7 96.2 94.6 95.4 0.56 96.7 96.2 95.5
Idaho 13 1.05 96.5 96.1 94.6 95.6 1.61 96.5 96.0 95.5
Kansas 13 0.46 96.8 96.5 95.0 95.3 0.70 96.7 96.4 95.4
Maine 14 0.93 97.8 97.1 95.9 95.7 1.43 97.5 97.0 95.9
New Hampshire 14 1.10 97.1 96.6 95.1 95.7 1.69 97.0 96.6 95.7
Mississippi 15 0.53 96.4 96.2 95.0 95.0 0.81 96.7 96.4 95.4
South Carolina 16 0.40 96.6 96.4 95.1 95.5 0.61 96.5 96.2 95.4
Colorado 18 0.51 96.7 96.3 95.4 95.3 0.77 96.7 96.3 95.4
Oklahoma 19 0.52 96.5 96.2 95.3 95.0 0.80 96.7 96.2 95.2
Alabama 20 0.43 96.8 96.4 95.1 95.9 0.65 96.8 96.6 95.8
Arkansas 22 0.74 96.7 96.5 95.3 95.7 1.14 96.6 96.3 95.5
Kentucky 22 0.52 96.6 96.4 95.2 95.5 0.80 96.4 96.3 95.5
Louisiana 25 0.58 95.7 95.5 94.4 95.0 0.89 95.8 95.6 94.9
Arizona 26 0.47 96.4 96.1 95.0 95.7 0.72 96.3 96.2 95.5
Nevada 26 1.21 96.4 95.6 94.7 94.9 1.88 96.5 95.6 94.9
Connecticut 27 0.69 96.3 96.0 94.6 95.3 1.06 96.1 95.9 95.3
Oregon 27 0.75 96.2 96.0 95.0 95.2 1.15 96.3 96.1 95.2
Minnesota 31 0.63 96.1 95.9 95.0 95.1 0.96 95.9 95.8 95.0
Missouri 36 0.59 96.0 95.9 94.9 95.2 0.91 96.1 95.9 95.2
Georgia 38 0.52 96.3 95.9 95.2 95.1 0.79 96.3 96.0 95.1
Virginia 38 0.53 96.3 96.1 95.2 95.3 0.81 96.2 96.1 95.3
Massachusetts 39 0.56 96.2 96.0 95.2 95.3 0.85 96.3 96.1 95.3
Maryland 40 0.75 96.1 96.0 95.2 95.5 1.16 96.4 96.2 95.5
Indiana 42 0.67 96.0 95.8 95.0 95.3 1.03 96.0 95.9 95.3
Wisconsin 43 0.75 95.6 95.5 94.5 94.8 1.16 95.6 95.5 94.8
Washington 47 0.80 95.9 95.7 94.9 95.2 1.23 95.9 95.7 95.2
Tennessee 50 0.83 96.0 95.9 94.9 95.2 1.28 96.0 95.9 95.2
North Carolina 53 0.64 96.0 95.9 95.2 95.3 0.99 96.2 96.1 95.4
New Jersey 59 0.65 96.0 95.8 95.2 95.3 1.00 96.1 95.9 95.3
Illinois 65 0.53 95.6 95.6 94.9 95.0 0.80 95.5 95.5 94.9
Ohio 68 0.56 95.9 95.8 95.1 95.2 0.86 96.0 95.9 95.3
Michigan 76 0.75 95.4 95.3 94.5 94.6 1.16 95.4 95.3 94.8
Pennsylvania 86 0.60 95.9 95.8 95.0 95.3 0.91 95.9 95.8 95.2
New York 118 0.59 95.9 95.8 95.3 95.3 0.90 95.7 95.6 95.2
Texas 140 0.76 95.8 95.7 95.2 95.2 1.19 95.5 95.4 95.2
Florida 145 0.70 96.1 96.0 95.5 95.5 1.09 95.8 95.7 95.3
California 254 0.81 95.7 95.6 95.3 95.3 1.28 95.8 95.7 95.4
Northeast 366 0.62 95.7 95.6 95.3 95.2 0.94 95.7 95.7 95.2
Midwest 412 0.61 95.6 95.5 95.2 95.2 0.93 95.2 95.2 94.8
West 446 0.74 95.6 95.6 95.4 95.3 1.18 95.7 95.6 95.4
South 663 0.64 95.0 95.0 94.8 94.9 0.98 95.2 95.1 95.0
Efficient interval estimation for age-adjusted cancer rates 559
Figure 1 Number of upper limits below true mean (over 10 000 replications).
Figure 2 Number of lower limits above true mean (over 10 000 replications).
560 RC Tiwari, LX Clegg and Z Zou
Figure 3 Number of CIs not covering true mean (over 10 000 replications).
numbers of the lower and upper limits of µithat fall above and below the true mean
are between 457 and 543. In Figure 4, we plotted the lengths of the simulated intervals
against the variance of wij.
From Figures 1–4, we observe that the modified gamma intervals have empirical cov-
erage at least 95%, but slightly lower than the gamma intervals of Fay and Feuer,4the
beta and normal intervals (with lower limits truncated at 0 if they were negative) also
have empirical coverage probabilities very close to 95%, and their widths are lower than
the gamma intervals. The coverage probabilities of the upper limits of both the beta and
modified gamma intervals are identical and at least 97.5%, but slightly lower than the
gamma intervals of Fay and Feuer.4The lower limits of the normal intervals are slightly
more conservative than those for gamma, while the upper limits of the normal inter-
vals are least conservative. The advantage of using modified gamma intervals over the
gamma intervals is clear from Figure 3, wherein the gamma intervals show a coverage
probability of around 97% as the variance wij increases, the modified intervals show the
coverage probability staying slightly higher than 95%. Overall, from these simulation
studies, the gamma intervals of Fay and Feuer4are more conservative than the proposed
gamma. The beta intervals are slightly more liberal than both the modified gamma and
the gamma intervals of Fay and Feuer.4The normal intervals are more liberal than the
beta intervals.
In simulations, when the Poisson means were 0, as the observed dij were 0, we set the
simulated values of Dij to be equal to 0. This is because Dij are non-negative random
variables with the means and variances equal and if the mean of a Dij is 0 then that Dij
Efficient interval estimation for age-adjusted cancer rates 561
Figure 4 Length of CIs.
is 0 with probability 1. Of course, when Dij have positive means, there is a good chance
that the simulations could still result in 0 for the simulated values of Dij. We considered
another simulation study where we took the Dij to be Poisson with means nijra,j, with
ra,j=I
i=1dij/njas the observed 2002 US age-specific mortality rates for tongue cancer.
Note that in this case, µi=J
j=1wij(nij ra,j)is a constant, independent of both iand j,
and the ratio of the means of two age-adjusted rates is 1. The results of this study were
very similar to those given in Table 1.
Next, we studied the performance of the 95% normal intervals for the ratios µi/µ
and the differences µi−µ, and their coverage probabilities were close to 0.95. As an
illustration, Figure 5 gives the plots of the number of CIs that do not contain the ratio
of the observed age-adjusted mortality rates for Arkansas to the US, for the normal
intervals, with lower limits truncated at 0 and with ln(µi/µ) transformation. For com-
parison, we also plotted these numbers for both the Fand modified Fintervals, ignoring
the dependence of Rion R. The Figure 5 shows that the Fintervals are very conservative,
the modified Fintervals and the normal intervals based on the logarithmic transforma-
tion have coverage probabilities close to 0.95, and the normal intervals with lower limits
truncated at 0 are slightly liberal. Of course, both the Fand modified Fintervals do not
incorporate the crucial assumption of the dependence between Riand R, and may not
be appropriate in this context.
We also applied the normal CIs for µi−µ, to compare if the 2002 esophagus age-
adjusted mortality rates for each of the 51 regions were equal to or not to the US
562 RC Tiwari, LX Clegg and Z Zou
Figure 5 Number of CIs not covering true ratio of 2002 age-adjusted mortality rates for Arkansas to US for
tongue cancer (over 10 000 replications).
age-adjusted rate using the 2002 esophagus mortality data. We found that the age-
adjusted rates for Ohio and Pennsylvania were different from the US when we applied
the normal CIs for µi−µwith correlated Riand R, as the CIs did not contain 0; whereas
when we applied the CIs for µi−µbased on the uncorrelated Riand R, the age-adjusted
rates for the two states were equal to that of the US as the CIs contained 0. For the other
49 regions, the two CIs produced results that were in agreement.
4 Discussion
The advantage of the modified gamma and Fintervals is that they depend on all wij
rather than just the largest value wi(J).TheFintervals are based on the ratio of two
chi-squared distributions that are independent and, unlike Fay,9do not depend on the
restrictions dij +dij=tjfor all j. Also, the advantage of using the estimates of ˜µi,˜σ2
i
and ˜ρi, based on the continuity correction factor, over their counterparts ˆµi,ˆσ2
iand ˆρi,
is more for the rare cancer sites. Without the adjustment for the continuity correction,
the normal and beta CIs for µi, for the tongue cancer site, were observed to be liberal,
especially for the regions with small mortality counts.
In Figures 1–4, we reported the performance of the gamma, modified gamma, beta
and normal (with lower limits truncated at 0)intervals. We also studied, but did not
report, the performance of the ABC, DKES, and normal intervals for µibased on the
Efficient interval estimation for age-adjusted cancer rates 563
transformations ln(−ln(Ri)) and ln(Ri/(1−Ri)). We observed that both the gamma
and modified gamma intervals always retained the nominal coverage of at least 0.95,
with the modified gamma intervals being less conservative than the gamma intervals.
None of the other intervals retained the nominal coverage. The DKES intervals were
next with the empirical coverage probabilities closer to the nominal value of 0.95, and
then the beta intervals, the ABC intervals, the normal (with lower limits truncated at
0)intervals, the normal intervals based on ln(−ln(Ri)), and the normal intervals based
on ln(Ri/(1−Ri)), in that order. Similarly, for the CIs for the ratios of the means of
two (uncorrelated) age-adjusted rates, both the Fintervals of Fay9and the modified F
intervals retained the nominal coverage of at least 0.95, with the modified Fbeing less
conservative of the two. The normal intervals (with lower limits truncated at 0)have
coverage probabilities very close to 0.95 followed by the normal intervals based on the
transformation ln(Ri/R(−i)).
We may mention that the beta intervals can be viewed as approximation to Bayesian
credible intervals for µi. Assume that 0 <λ
ij <1 are small so that Dij ∼ind Bin(nij,λij).
Further assume that λij are independent with prior π(λij)∝1, 0 <λ
ij <1. Then the
posterior distributions are given by
λij|nij ,rij ∼ind Be(nij rij +1, nij(1−rij)+1)≈Be(nijrij,nij(1−rij ))
and we can approximate the posterior means and variances of µi=J
j=1wijλij by ˜riand
˜νi. Now, the credible intervals can be obtained as follows. Generate G∗(large) Markov
chain Monte Carlo (MCMC) values on λ(g)
ij ,g=1, ...,G∗, using Gibbs sampler, from
the posterior distributions of λij, and compute the G∗values of µi, namely, µ(g)
i=
G∗
g=1wijλ(g)
ij , and then construct the 100(1−α)% credible interval for µifrom the
empirical distribution of {µ(g)
i}, by ordering these values from the smallest to the largest
and taking the credible interval to be the 100(α/2)th and 100(1−α/2)th ordered values.
We performed MCMC simulations and constructed the credible intervals for the 2002
age-adjusted mortality rates for the tongue cancer for the 51 regions of the US and found
that the credible intervals were more liberal than the beta intervals in Table1.
The assumption that the mortality or incidence counts are independent Poisson is used
by many, for example, see Brillinger,3and is perhaps a consequence of the underlying
birth/death (continuous) Poisson process model. We have not seen any analyses for the
age-adjusted rates for the case of correlated Dij. However, as pointed out by a referee,
it is quite possible for neighboring states to have common socio-economic and other
factors resulting in correlated Dijs. This is an important topic for future research.
5 Conclusion
We presented CIs for the means of the cancer age-adjusted rates for the 51 regions, µi,
for the US µ, for the ratios of the means µi/µi,µi/µ(−i),µi/µ and for the differences
µi−µ. We developed modifications of the gamma interval of Fay and Feuer,4and
564 RC Tiwari, LX Clegg and Z Zou
the Finterval of Fay,9and proposed new CIs based on the beta and normal intervals.
Simulations were carried out to compare the performance of these intervals in terms of
their empirical coverage probabilities, and results showed that the modified gamma and
Fintervals performed better than the gamma interval of Fay and Feuer4and the Finterval
of Fay9in terms of retaining the nominal coverage. The other intervals such as the DKES,
ABC, beta, and the truncated normal intervals were shown to be good competitors. The
modified gamma and Fintervals are going to replace the gamma and Fintervals in the
SEER Program. In addition, for comparing µiand µ, the normal intervals for µi−µ
that incorporate the correlation between Riand Rare also recommended to replace the
ones that are based on the uncorrelated Riand Rin the SEER Program.10 Even though
the results of this paper are presented in the context of constructing the CIs for the (true)
age-adjusted mortality rates based on data from the SEER Program, they can be applied
to similar data from other countries as well.
Acknowledgements
The authors would like to thank the editor and the two referees for their valuable
comments that led to a significant improvement of the original manuscript.
References
1 Jemal A, Murray T, Ward E, Samuels A,
Tiwari RC, Ghafoor A, Feuer EJ, Thun MJ.
Cancer Statistics 2005. CA A Cancer Journal
for Clinicians 2005; 55: 1–22.
2 American Cancer Society, Cancer Facts &
Figures 2005.
3 Brillinger DR. The natural variability of vital
rates and associated statistics (with
discussion), Biometrics 1986; 42: 693–734.
4 Fay MP, Feuer EJ. Confidence intervals for
directly standardized rates: a method based
on the gamma distribution. Statistics in
Medicine 1997; 16: 791–801.
5 DiCiccio T, Efron B. More accurate
confidence intervals in exponential families.
Biometrika 1992; 79: 231–45.
6 Efron B, Tibshirani RJ. An introduction to
the bootstrap. Chapman & Hall, 1993.
7 Swift MB. Simple confidence intervals for
standardized rates based on the approximate
bootstrap method. Statistics in Medicine
1995; 14: 1875–88.
8 Dobson AJ, Kuulasmaa K, Ederle E,
Scherer J. Confidence intervals for weighted
sums of Poisson parameters. Statistics in
Medicine 1991; 10: 457–62.
9 Fay MP. Approximate confidence intervals for
rate ratios from directly standardized rates
with sparse data. Communications in
Statistics – Theory and Methods 1999; 28:
2141–60.
10 Ries LAG, Eisner MP, Kosary CL, Hankey BF,
Miller BA, Clegg LX, Edwards BK. SEER
Cancer Statistics Review, 1973–1997.
National Cancer Institute (NIH Pub. No.
00-2789), 2000.
11 Johnson NL, Kotz S. Continuous univariate
distributions – I. Wiley, 1969.
12 Bickel PJ, Doksum KA. Mathematical
statistics: basic ideas and selected topics,
Holden-Day, Inc., 1977.
13 Breslow NE, Day NE. Statistical methods in
cancer research, volume II –the design and
analysis of cohort studies. Oxford University
Press, 1987.
14 Casella G, Berger RL. Statistical inference.
Wadsworth & Brooks/Cole Advanced Books
& Software, 1990.
Efficient interval estimation for age-adjusted cancer rates 565
Appendix A: Means and variances of Ri,R(−i)and R, and of their
ratios, and the covariance between Riand R
We can rewrite Ri,R(−i)and Ras
Ri=1
n
J
j=1
wjDij
ξjξij ;R(−i)=1
n
J
j=1
wjD(−i)j
ξjξ(−i)j;R=1
n
J
j=1
wjDj
ξj
Let
σ2
i=
J
j=1
w2
j
λij
ξjξij ;σ2
(−i)=
J
j=1
w2
jI
i=iξijλij
ξjξ2
(−i)j;
σ2=
J
j=1
w2
jI
i=1ξijλij
ξj;ρi=
J
j=1
w2
j
λij
ξj
Then,
µi≡E(Ri)=
J
j=1
wjλij;µ(−i)≡E(R(−i))=
J
j=1
wjI
i=iξijλij
ξ(−i)j;
µ≡E(R)=
J
j=1
wjI
i=1
ξijλij
vi≡Var (Ri)=σ2
i
n;ν(−i)=σ2
(−i)
n;
v≡Var (R)=σ2
n;Cov(Ri,R)=ρi
n
Using the delta-method, the means and variances of the ratios Ri/Ri,Ri/R(−i)and Ri/R
are given by
ERi
Ri≈µi
µi;ERi
R(−i)≈µi
µ(−i)
;ERi
R≈µi
µ
Var Ri
Ri≈σ2
iµ2
i+σ2µ2
i
nµ4
i
;Var
Ri
R(−i)≈σ2
iµ2
(−i)+σ2µ2
i
nµ4
(−i)
Var Ri
R≈σ2
iµ2+σ2µ2
i−2ρiµiµ
nµ4
566 RC Tiwari, LX Clegg and Z Zou
Appendix B: ABC and DKES intervals
The ABC intervals are4
LABC(µi;α) =ˆµi+z0i−zα/2
{1−ai[z0i−zα/2]}2ˆσi
√n
UABC(µi;α) =ˆµi+z0i+zα/2
{1−ai[z0i+zα/2]}2ˆσi
√n
where zα/2=−1(1−α/2)is the upper α/2th percentile point of the standard normal
distribution function, ,ai=z0i=(J
j=1w3
ijdij )/(6ˆν3/2
i). The DKES intervals are4
LDKES(µi;α) =ˆµi+ˆσi
nJ
j=1dij
1
2χ2
2(J
j=1dij)−1α
2−
J
j=1
dij
UDKES(µi;α) =ˆµi+ˆσi
nJ
j=1dij
1
2χ2
2(1+J
j=1dij)−11−α
2−
J
j=1
dij
Appendix C: Asymptotic normality and confidence intervals based
on Rij
LetR=(R11,...,R1J,...,RI1,...,RIJ)T,¯
R=(R1,...,RI,R)T,µ=(µ1,...,µI,µ)T
and let =((σij)) be (I+1)×(I+1)matrix with σii =σ2
i,σi,I+1=σI+1,i=ρiand
σii=0fori= i. Here the superscript T denotes the transpose. Since Rcan be expressed
as ¯
R=ARfor an appropriately defined matrix A,wehave
√n(¯
R−µ) −→ N(I+1)(0,)
where =A[Cov(R)]ATand Np(b,B)denotes a p-dimensional multivariate normal
distribution.
Thus for any non-null (I+1)-column vector a,
√naT(¯
R−µ) −→ N(0, aTa)
Efficient interval estimation for age-adjusted cancer rates 567
In particular, by choosing aappropriately, we have
Ri=
J
j=1
wjRij ∼ind AN µi,σ2
i
n
R(−i)=
J
j=1
wjI
i=iξijRij
ξ(−i)j∼AN µ(−i),σ2
(−i)
n
R=
J
j=1
wjI
i=1
ξijRij ∼AN µ,σ2
n
Ri
Ri∼AN µi
µi,σ2
iµ2
i+σ2
iµ2
i
nµ4
i;Ri
R(−i)∼AN µi
µ(−i)
,σ2
iµ2
(−i)+σ2
(−i)µ2
i
nµ4
(−i)
Ri
R∼AN µi
µ,σ2
iµ2+σ2µ2
i−2ρiµiµ
nµ4
(Ri−R)∼AN µi−µ,σ2
i+σ2−2ρi
n
µi=!ˆµi±zα/2ˆσi
√n"∨0; µ=!ˆµ±zα/2ˆσ
√n"∨0
µi
µi=
ˆµi
ˆµi±zα/2(ˆσ2
iˆµ2
i+ˆσ2
iˆµ2
i)
nˆµ4
i
∨0
µi
µ=
ˆµi
ˆµ±zα/2(ˆσ2
iˆµ2+ˆσ2ˆµ2
i−2ˆρiˆµiˆµ)
nˆµ4
∨0
µi
µ(−i)=
ˆµi
ˆµ(−i)±zα/2(ˆσ2
iˆµ2
(−i)+ˆσ2
(−i)ˆµ2
i)
nˆµ4
(−i)
∨0
µi−µ=ˆµi−ˆµ±zα/2ˆσ2
i+ˆσ2−2ˆρi
√n
where a∨b=max(a,b).
568 RC Tiwari, LX Clegg and Z Zou
Since 0 ≤Ri≤1 and 0 ≤Ri/R(−i)≤∞with probability 1, the following transforma-
tions are commonly used to transform the range of these random variables to (−∞,∞)
and their results on the asymptotic normality yield:
ln(−ln Ri)∼AN ln(−ln(µi)),σ2
i
n(µiln µi)2
log it(Ri)≡ln Ri
1−Ri∼AN ln µi
1−µi,σ2
i
n(µi(1−µi))2
ln Ri
R(−i)∼AN ln µi
µ(−i),1
n#σ2
i
µ2
i+σ2
(−i)
µ2
(−i)$
Based on these transformations, the CIs for µi,µi/µ(−i)and µi/µ are given as follows:
I)
µi=exp !−exp %ln(−ln(ˆµi)) ±zα/2ˆσi
(ˆµiln ˆµi)√n&"
II)
µi=%1+exp !−%ln ˆµi
1−ˆµi±zα/2ˆσi
(ˆµi(1−ˆµi))√n&"&−1
III)
µi
µ(−i)=exp
ln ˆµi
ˆµ(−i)±zα/2#1
n#ˆσ2
i
ˆµ2
i+ˆσ2
(−i)
ˆµ2
(−i)$$1/2
IV)
µi
µ=exp
ln ˆµi
ˆµ±zα/2ˆµ
ˆµi#1
nˆσ2
iˆµ2+ˆσ2ˆµ2
i−2ˆρiˆµˆµi
ˆµ4$1/2
The CIs in III) above, were also derived by Breslow and Day.13 Note that we will use ˜µi,
˜νiand ˜ρiinstead of ˆµi,ˆνiand ˆρi.
Efficient interval estimation for age-adjusted cancer rates 569
Appendix D: Beta approximations of Rij and Ri
Using the relation that14
x
k=0n
kpk(1−p)n−k=(n+1)
(x+1)(n−x)1−p
0tn−x−1(1−t)xdt
=1−p
0B(t|n−x,x+1)dt
=1
pB(t|x+1, n−x)dt
It then follows that
P(Rij ≥rij|(Dij +¯
Dij)=nij ,λij )=λij
0B(t|nijrij +1, nij (1−rij )) dt
Another heuristic argument for the beta approximation for Rij is based on the gamma
or chi-squared approximation of a Poisson distribution. Let χ2
kand αχ2
kdenote a chi-
squared random variable with kdegrees of freedom and a re-scaled (by a factor α>0)
χ2
krandom variable. Note that χ2
k=dG(k/2, 1), and if χ2
rand χ2
sare independent,
χ2
r/(χ2
r+χ2
s)=dχ2
r/(χ2
r+s)∼Be(r/2, s/2), and that χ2
r/(χ2
r+χ2
s)and χ2
r+χ2
sare
independent with χ2
r+χ2
s=dχ2
r+s.
Since Dij and ¯
Dij are independent, distributed as Po(nijλij)and Po(nij(1−λij)),
respectively, and their distributions can be approximated by independent chi-squared
distributions 1/2χ2
2([nijrij ]+1)and 1/2χ2
2(nij−[nij rij ]), where [x]denotes the integer value of
x,wehave
Dij
Dij +¯
Dij 1/2χ2
2([nijrij ]+1)
1/2χ2
2([nijrij ]+1)+1/2χ2
2(nij−[nij rij ])
=χ2
2([nijrij ]+1)
χ2
2([nijrij ]+1)+χ2
2(nij−[nij rij ])∼Be([nijrij]+1, nij −[nij rij]).
Thus, Rij ∼Be([nijrij]+1, nij −[nijrij ]). We can now approximate the distribution of
Ri=J
j=1wjRij by a beta distribution, Be(ˆai,ˆ
bi), where
ˆai=˜ri˜ri(1−˜ri)
˜νi−1,ˆ
bi=(1−˜ri)˜ri(1−˜ri)
˜νi−1