ArticlePDF Available

Efficient interval estimation for age-adjusted cancer rates

Authors:

Abstract and Figures

The age-adjusted cancer rates are defined as the weighted average of the age-specific cancer rates, where the weights are positive, known, and normalized so that their sum is 1. Fay and Feuer developed a confidence interval for a single age-adjusted rate based on the gamma approximation. Fay used the gamma approximations to construct an F interval for the ratio of two age-adjusted rates. Modifications of the gamma and F intervals are proposed and a simulation study is carried out to show that these modified gamma and modified F intervals are more efficient than the gamma and F intervals, respectively, in the sense that the proposed intervals have empirical coverage probabilities less than or equal to their counterparts, and that they also retain the nominal level. The normal and beta confidence intervals for a single age-adjusted rate are also provided, but they are shown to be slightly liberal. Finally, for comparing two correlated age-adjusted rates, the confidence intervals for the difference and for the ratio of the two age-adjusted rates are derived incorporating the correlation between the two rates. The proposed gamma and F intervals and the normal intervals for the correlated age-adjusted rates are recommended to be implemented in the Surveillance, Epidemiology and End Results Program of the National Cancer Institute.
Content may be subject to copyright.
Statistical Methods in Medical Research 2006; 15: 547–569
Efficient interval estimation for age-adjusted
cancer rates
Ram C Tiwari National Cancer Institute, NIH, Bethesda, MD, USA, Limin X Clegg Office
of Healthcare Inspections, OIG, Department of Veterans Affairs, Washington, DC, USA and
Zhaohui Zou Information Management Services, Inc., Silver Spring, MD, USA
The age-adjusted cancer rates are defined as the weighted average of the age-specific cancer rates, where the
weights are positive, known, and normalized so that their sum is 1. Fay and Feuer developed a confidence
interval for a single age-adjusted rate based on the gamma approximation. Fay used the gamma appro-
ximations to construct an Finterval for the ratio of two age-adjusted rates. Modifications of the gamma
and Fintervals are proposed and a simulation study is carried out to show that these modified gamma
and modified Fintervals are more efficient than the gamma and Fintervals, respectively, in the sense that
the proposed intervals have empirical coverage probabilities less than or equal to their counterparts, and
that they also retain the nominal level. The normal and beta confidence intervals for a single age-adjusted
rate are also provided, but they are shown to be slightly liberal. Finally, for comparing two correlated
age-adjusted rates, the confidence intervals for the difference and for the ratio of the two age-adjusted rates
are derived incorporating the correlation between the two rates. The proposed gamma and Fintervals
and the normal intervals for the correlated age-adjusted rates are recommended to be implemented in the
Surveillance, Epidemiology and End Results Program of the National Cancer Institute.
1 Introduction
Despite rapid advances in medicine, cancer continues to be a major public health concern
in the US and around the world. The total number of deaths due to cancer continues
to rise, even though the age-adjusted mortality rates for many common cancer sites
continue to decline.1Many public and private agencies dealing with cancer and related
issues depend on these statistics for planning and resource allocation. Such figures have
important social and economic ramifications, from deciding which programs get funded,
to deciding how funds are allocated among various programs. Having reliable and accu-
rate confidence intervals (CIs) for the means of the age-adjusted cancer mortality and
incidence rates for recent years is very important for everyone concerned. The higher the
coverage probabilities of the CIs, the more conservative the CIs are. Therefore, a desir-
able property of these CIs is that while retaining the nominal level, they have coverage
probabilities as close to the nominal level as possible.
In the US, the data on cancer mortality are obtained from death certificates. Due to
administrative and procedural delays, these data become fully available to the public
Address for correspondence: Limin Clegg, Office of Healthcare Inspections, VA OIG 801 1 Street, NW,
Room 1013, Washington, DC 20001, USA. E-mail: lin_clegg@va.gov
© 2006 SAGE Publications 10.1177/0962280206070621
548 RC Tiwari, LX Clegg and Z Zou
from the National Center for Health Statistics (NCHS) after approximately three years.
The cancer incidence and mortality data are also available from the Surveillance,
Epidemiology and End Results (SEER) Program of the National Cancer Institute
(NCI). The SEER Program is an authoritative source for the cancer incidence and sur-
vival data in the US Population data are available from the US Census Bureau. The
American Cancer Society (ACS) publishes reports on cancer trends in their widely cir-
culated annual publication,2Cancer Facts & Figures, which is also available online:
http://www.cancer.org/.
The state-level age-adjusted cancer (incidence or mortality) rates are given by
ri=
J
j=1
wjdij
nij ,i=1, ...,I
where dij and nij are the number of cancer (incidence or mortality) counts and the count
of mid-year population for the age-group jand the state i, respectively, and the wjare
the normalized proportion of mid-year population for the age-group jin the standard
population, so that J
j=1wj=1. In the SEER Program, for each of the 51 regions (50
states and Washington D.C.) in the US, there are 19 standard age-groups consisting of
0<1, 14, 59, ...,85+. The US-level age-adjusted cancer (incidence or mortality)
rates are given by
r=
J
j=1
wjdj
nj
with dj=I
i=1dij and nj=I
i=1nij. The SEER Program contains age-adjusted mor-
tality rates, based on the 2000 US standard population, for the US and for each of the
51 regions by cancer sites. The age-adjusted mortality rates for a selected number of
cancer sites and a number of countries in the world are also reported in the Cancer
Facts & Figures publication.2These age-adjusted rates are based on the World Health
Organization’s world standard population. Thus, the results of this paper, even though
discussed in the context of the age-adjusted mortality rates for the US, apply to similar
data sets for other countries.
For each i(i=1, ...,I), let d(i)j=djdij and n(i)j=njnij and define
r(i)=
J
j=1
wjd(i)j
n(i)j
to be the age-adjusted rate for the rest of the US after deleting the region i.
Let Dij,Dj,D(i)j,Ri,R(i)and Rdenote the random variables whose realiza-
tions are dij,dj,d(i)j,ri,r(i)and r, respectively. We assume that Dij are independent
Poisson random variables3with parameters λij, that is, Dij ind Po(nijλij ). Note that by
Efficient interval estimation for age-adjusted cancer rates 549
the moment generating function, D(i)jPo(I
i=inijλij)and DjPo(I
i=1nijλij ).
Let ξij =nij/nj,ξ(i)j=I
i=iξijand ξj=nj/n, where n=J
j=1nj=I
i=1J
j=1nij.
Let µi,µ(i),µ,vi=σ2
i/n,ν(i)=σ2
(i)/nand vσ2/nbe the means and variances
of Ri,R(i)and R, respectively, and let and ρi/nbe the Cov(Ri,R), where their explicit
expressions are derived in Appendix A. Let wij =wj/nij and define the estimates of µi,
µ(i),µ,σ2
i,σ2
(i),σ2and ρias
ˆµi=ri;ˆµ(i)=r(i);ˆµ=r
ˆσ2
i=n
J
j=1
w2
ijdij ;ˆσ2
(i)=n
J
j=1
w2
j
d(j)
n2
(j)
ˆσ2=n
J
j=1
w2
j
dj
n2
j
;ˆρi=n
J
j=1
wij wj
njdij
For a rare cancer site, as the observed total counts diare very small with dij =0 plausibly
for several j, the value of riis either close to 0 or equal to 0. As we will see subse-
quently, when ri=0, the gamma intervals of Fay and Feuer4is not defined. To avoid
such situations, we introduce a correction factor, which amounts to distributing a count
of 1 uniformly to all Jcategories, and hence adding 1/J, the expected value under multi-
nomial distribution with parameters 1 and cell probabilities 1/J,todij,j=1, ...,J,in
calculation of the estimates of µi,σ2
iand ρi. We redefine rias
˜ri=
J
j=1
wij dij +1
J=riwi
where ¯wi=1/JJ
j=1wij and modify the estimates of µi,σ2
i,σ2
(i)and ρiaccordingly
by replacing dij by (dij +1/J). Thus,
˜µiri;˜σ2
i=n
J
j=1
w2
ij dij +1
J;˜ρi=n
J
j=1
wij wj
njdij +1
J
Note that ˜ririfor common cancer sites as ¯wi0. Let
ˆνi=ˆσ2
i
n;˜νi=˜σ2
i
n;ˆν=ˆσ2
n
The objectives of this paper include the construction of CIs for parameters such as
i) the mean µiof the age-adjusted rate for the region i; ii) the mean µof the age-adjusted
550 RC Tiwari, LX Clegg and Z Zou
rate for the US; iii) the ratio of the mean age-adjusted rates µiifor region ito region
i; iv) the ratio of the mean age-adjusted rates µi(i)for region ito the rest of the US;
v) the ratio of the mean age-adjusted rates µi for region ito the US; and vi) the
difference of the mean age-adjusted rates µiµ, between region iand the US. Fay and
Feuer4derived a CI for µi(or µ) assuming that a mixture of Poisson distributions can be
approximated by a gamma distribution and compared the performance of the gamma
intervals with the approximate bootstrap confidence (ABC) intervals57and the ‘chi-
squared’ intervals of Dobson et al.8through simulations. They observed that the gamma
intervals retained at least the nominal coverage and were more conservative than the ABC
intervals and chi-squared intervals. We propose a modification of the gamma interval
for µi(or µ) developed by Fay and Feuer4and derive new CIs for µi(and µ) based on
the beta and normal approximations of Ri(and R).
Fay9used the gamma approximation of Fay and Feuer4and developed a CI, based on
an approximate Fdistribution, for the ratio of two age-adjusted rates that can be applied
to µiiand µi(i), but not to µi as the age-adjusted rate for the US involves the
counts from the region i. We also propose a modification of the Finterval of Fay9. We use
the normal approximations of Ri/Ri,Ri/R(i),Ri/Rand RiR, taking into account
the correlation between Riand R, and construct the CIs for µii,µi(i),µi and
µiµ. It is important to mention that for comparing the state and US level age-adjusted
rates, the current procedure10 is to use the normal CI for µiµbased on ρi=0. For
simulations, we use the observed age-adjusted mortality rates for the 51 regions and the
US for year 2002 from the SEER Program for a rare cancer site, the tongue cancer.
The rest of the paper is organized as follows. In Section 3, we briefly review the works
of Fay and Feuer4and Fay9and present the modified gamma and Fintervals. We also
derive the CIs for the ratio of the means of the two age-adjusted rates namely the age-
adjusted rates of any two regions, any region to the rest of the country, any region to
the entire country and for the difference of the means of the age-adjusted rates between
a region and the country. Simulations are carried out in Section 4, and we discuss our
findings in Section 5. The conclusions are presented in Section 6.
2 Confidence intervals for age-adjusted rates
2.1 Gamma and Fapproximations
Note that if XPo ), then for an integer x0,
P(Xx|θ) =θ
0fZ(z|x,1)dz
where ZG(x,1)=d1/2χ2
2xand, in general, G,β) =dβ/2χ2
2α(allowing non-
integer degrees of freedom) has density
fZ(x|α,β) =1
βα(α) exp x
βxα1,x>0
Efficient interval estimation for age-adjusted cancer rates 551
with mean E(Z)=αβ and variance Var(Z)=αβ 2. Let xbe the observed value of X
and let (L(x;α),U(x,α)) denote the 100(1α)%CIforθ, where L(x;α) is obtained
by solving the equation
P(Xx|θ=L(x;α)) =α
2
and U(x,α) is obtained by solving
P(Xx|θ=U(x;α)) =α
2
or equivalently by solving
P(X>x|θ=U(x;α)) =P(Xx+1|θ=U(x;α)) =1α
2
Thus, L(x;α) =1/22
2x)1(α/2)and U(x;α) =1/2 2
2(x+1))1(1α/2). Fay and
Feuer4called the interval (L(x;α),U(x;α)) ‘exact’ while others, for example, Johnson
and Kotz,11 use the term ‘approximate’ interval.
Let wi(1)···wi(J)be the ordered values of wij,j=1, ...,J. Fay and Feuer4
assumed that a mixture of Poisson distributions is approximately distributed as a gamma
distribution; that is,
P
J
j=1
wijDij y|µi,νi
µi
0fZiz
y2
νi,νi
ydz
where ZiG(y2i,νi/y). This assumption essentially means that the distribution of
a linear combination of independent Poisson random variables is approximately dis-
tributed as a gamma random variable with the mean and variance of the gamma
distribution equal to the mean and variance of the linear combination, respectively.
Fay and Feuer4used this approximation to construct approximate 100(1α)% CIs for
the true age-adjusted rates µi.
The lower confidence limit L(ri;α) was obtained by solving the equation
α
2=P
J
j=1
wijDij ri|µi,νi
Li;νi)
0fZiz
r2
i
νi,νi
ridz
This yields
L(ri;ˆνi;α) =G1α
2;r2
i
ˆνi,ˆνi
ri=ˆνi
2riχ2
2r2
i/ˆνi1α
2
where G1is the inverse function of the gamma distribution function and 2
l)1(α)
denotes the 100αth percentile of the chi-squared distribution with ldegrees of freedom.
552 RC Tiwari, LX Clegg and Z Zou
Note that when ri=0, L(ri;ˆνi;α) is not defined. For the upper confidence limit U(ri;α),
Fay and Feuer4solved the equation
1α
2=P
J
j=1
wijDij >ri|µi,νi
P
J
j=1
wijDij ri+wi(J)|µi,νi
Ui;νi)
0fZiz
(ri+wi(J))2
νi+w2
i(J)
,νi+w2
i(J)
ri+wi(J)dz
resulting in
U(ri;ˆνi,wi(J);α) =G11α
2;r2
i
ˆνi,ˆνi
ri,wi(J)
=ˆνi+w2
i(J)
2(ri+wi(J))χ2
(2(ri+wi(J))2/ˆνi+w2
i(J))11α
2
Fay and Feuer4performed simulations to study the performance of their gamma CIs
(L(ri;ˆνi;α),U(ri;ˆνi;wi(J);α)) and found that the upper confidence limits were more
conservative than those based on the ABC intervals57and the chi-squared intervals of
Dobson et al.,8henceforth referred to as DKES intervals. For completeness the ABC and
DKES intervals are given in Appendix B.
Fay and Feuer4have mentioned that when the weights wij for all jare equal to a con-
stant, ci>0, say, the CI for µi=E(J
j=1wijDij )=ciE(Di)is (ciL(di;α),ciU(di:α))
exact with DiPo(J
j=1nijλij ). However, note that since wij =wj/nij depend on both
the standards wjand on the mid-year populations nij, the condition that wij are equal
to a constant for all jis not easily satisfied. For example, a sufficient condition for this
condition to hold is that wjare all equal and nij are all equal. Another sufficient con-
dition for wij to be equal to a constant for all jis to assume nij is proportional to wj,
independent of i, for all j. If a populous state like California or New York has the age-
group distribution of its population similar to that of the entire US, then for that state,
one may expect nij to be proportional to wjand hence the CI for µito be exact.
Since wi(l)wi(l+1),wehave
P
J
j=1
wijDij >ri|µi,νi
P
J
j=1
wijDij ri+wi(l)|µi,νi
P
J
j=1
wijDij ri+wi(l+1)|µi,νi
,l=1, ...,J1
Efficient interval estimation for age-adjusted cancer rates 553
Thus proceeding as above, one can construct the upper confidence limits
U(ri;ˆνi;wi(1);α),U(ri;ˆνi;wi(2);α),...,U(ri;ˆνi,wi(J);α) varying from being the most
liberal upper limit to the most conservative upper limit. In fact, there are an infinite
number of choices for such an upper confidence limit.
As a compromise, we propose an upper limit that is based on the mean ¯wi=
1/JJ
j=1wij and that depends on all wi(l),l=1, ...,J. As mentioned earlier, this
assumes distributing a count of 1 uniformly to all Jage-groups. Thus,
1α
2=P
J
j=1
wijDij >ri|µi,νi
P
J
j=1
wijDij riwi|µi,νi
=P
J
j=1
wijDij ≥˜ri|µi,νi
Now,assuming that (dij +1/J)havemeans equal to their variances,similar to the Poisson
distribution, so that
Var
J
j=1
wij dij +1
J
=
J
j=1
w2
ij dij +1
J
using the gamma approximation, the upper confidence limit for µiis given by
U(˜ri;˜νi;¯wi;α) =˜νi
2˜ri 2
(2˜r2
i/˜νi))11α
2
Therefore, the proposed gamma CI for µiis (L(ri;ˆνi;α),U(˜ri;˜νi;¯wi;α)). Another
approximation of the upper confidence limit based on the mean ¯wican be obtained
by using (ˆνiw2
i)instead of ˜νi. This results in the following CI: (L(ri;ˆνi;α),U(˜ri;ˆνi+
¯w2
i;¯wi;α)). Through simulations (not shown here), we found that these two intervals
performed very similarly. Therefore, we will focus only on (L(ri;ˆνi;α),U(˜ri;˜νi;¯wi;α)).
Note that the lower limits of the gamma interval of Fay and Feuer4and the modi-
fied gamma intervals are the same. We shall define the CI for µiwhen ri=0as(0,
U(˜ri;˜νi;¯wi;α)), thus ensuring a coverage probability of at least (1α).
Fay9developed a confidence interval for the ratio of two age-adjusted rates, µii,for
µi,µi>0, based on Ri=J
j=1wijDij and Ri=J
j=1wijDij, where Dij and Dijare
independent. Assuming the gamma approximations for Riand Ri, Fay9used the result
that, conditional on Dij +Dij=tj, the distribution of Dij is a binomial distribution with
parameters tjand nijij ij)/nijijij)+nij. For constructing the lower confidence
limit for µii, Fay9assumed that µiis distributed as gamma with mean riand variance
554 RC Tiwari, LX Clegg and Z Zou
ˆνiand that µiis distributed as gamma with mean (ri+Wi) and variance (ˆνi+W2
i)
and used the result that, conditional on tj,
ri+Wi
riµi
µiF((2r2
i/ˆνi),(2(ri+Wi)2/( ˆνi+W2
i)))
where Wi=maxj:dij<tj{wij}and for independent χ2
m=dG(m/2, 2)and χ2
n=d
G(n/2, 2),F(m,n)=d2
m/m)/(χ2
n/n)denotes the Fdistribution with numerator degrees
of freedom (d.f.) mand the denominator d.f. nwith density given by
g(x|m,n)=((m+n)/2)
(m/2)(n/2)m
nm/2x(m/2)1
(1+(m/n)x)(m+n)/2,0<x<
Since the numerator and the denominator chi-squared random variables in
F((2r2
i)/ˆνi),(2(ri+Wi)2/( ˆνi+W2
i)) depend on tj, the unconditional distribution of µiiis
a mixture of Fdistributions, and not an Fdistribution.
The lower confidence limit is
ri
ri+WiF1
((2r2
i)/ˆνi,(2(ri+Wi)2)/( ˆνi+W2
i)) α
2
where F1
(a,b)(p)is the pth percentile of F(a,b). Now, assuming that µiis distributed as
gamma with mean (ri+Wi) and variance (ˆνi+W2
i) and that µiis distributed as gamma
with mean riand variance ˆνi, Fay9derived the upper confidence limit to be
ri+Wi
riF1
((2(ri+Wi)2)/(ˆνi+W2
i),(2r2
i)/ˆν
i)) 1α
2
Note that this approximation cannot be readily applied for constructing CIs for the ratios
µi, that is, the ratios of the age-adjusted rates for the regions ito the US age-adjusted
rates, as the latter depends on the former ones.
We propose a modification in the above CI for µii. For the lower limit, we assume
that µiis distributed as gamma with mean riand variance ˆνiand that µiis distributed
as gamma with mean ˜riand variance ˜νiand since the two distributions are independent
chi-squares, we have ˜ri
riµi
µiF((2r2
i)/ˆνi,(2˜r2
i)/˜νi)
This results in the lower limit to be ri/˜riF1
((2r2
i)/ˆνi,(2˜r2
i)/˜νi)(α/2). Similarly, the upper limit
can be obtained. The proposed CI for µiiis
ri
˜riF1
((2r2
i)/ˆνi,(2˜r2
i)/˜νi)α
2,˜ri
riF1
((2˜r2
i)/˜νi,(2r2
i)/ˆνi)1α
2
Efficient interval estimation for age-adjusted cancer rates 555
Another CI for µiiusing (riwi)and (ˆνiw2
i)instead of riand ˜νiis given by
ri
riwiF1
((2r2
i)/ˆνi,(2(riwi)2)/(ˆνiw2
i))α
2,riwi
riF1
((2(riwi)2)/(ˆνiw2
i),(2r2
i)/ˆν
i)1α
2
Once again, we mention that this interval performs similarly to the above one, and we
will not focus on this. We further remark that, unlike as in Fay,9these intervals do not
assume the dependence of the wij on tj.
2.2 Normal approximations
Define Rij =Dij/nij (=Dij /(nξijξj)). Let n→∞so that 0
ij,ξj<1. Note that
0
ij <. Then as min{nijλij}→∞,
nξijξj
λij 1/2
(Rij λij)−→ind N(0, 1),i=1, ...,I;j=1, ...,J
That is, Rij are independent and asymptotically normally distributed, Rij
ANij,λij/(nξij ξj)). The other asymptotic results based on Rij, 100(1α)% CIs for
µi,µ,µii,µi,µi(i)and µiµ, and their logarithmic and logit transforma-
tions are presented in Appendix C. In particular, the 100(1α)% CIs for µi, and
µiµ, based on the correlated age-adjusted rates, are given by
µi
µ=
ˆµi
ˆµ±zα/2(ˆσ2
iˆµ2σ2ˆµ2
i2ˆρiˆµiˆµ)
nˆµ4
0
µiµµi−ˆµ±zα/2ˆσ2
iσ22ˆρi
n
where ab=max(a,b). When ρi=0, which is true iff λij =0 for all j, these CIs reduce
to (see, e.g., Ries et al.10 for the CI of µiµwhen ρi=0)
µi
µ=ˆµi
ˆµ±zα/21
n1
ˆµ4(ˆσ2
iˆµ2σ2ˆµ2
i)0, µiµµi−ˆµ±zα/2ˆσ2
iσ2
n
Since ρi>0, the length of the CI for µiµ, ignoring the adjustment for ρi, is wider,
and hence the interval is more conservative.
2.3 Beta approximations
In general, the age-adjusted rates are less than 1 and equal to 1 if and only if there is
one age-group with both the values of cancer counts and population at risk for that age
556 RC Tiwari, LX Clegg and Z Zou
group equal to 1, which is not a practical case. A rationale for the beta approximation
is as follows. Let Ri=J
j=1wjRij, where Rij =Dij/nij. Let Dij and ¯
Dij be independent
Poisson random variables with means nijλij and nij (1λij), respectively. Then the dis-
tribution of Dij|Dij +¯
Dij=nij ,λij Bin(nij,λij), a binomial distribution with parameters nij
and λij.12 Using the result, given in Appendix D, we can approximate the distribution of
Riby a beta distribution with parameters ˆaiand ˆ
bi,Be(ˆai,ˆ
bi), where
ˆairi˜ri(1−˜ri)
˜νi1,ˆ
bi=(1−˜ri)˜ri(1−˜ri)
˜νi1
We define an approximate 100(1α)%CIforµias (L¯
Ri,U¯
Ri), where L¯
Rand U¯
Rare
obtained by solving the following incomplete beta integrals:
L¯
Ri
0B(xai,ˆ
bi)dx=α
2,U¯
Ri
0B(xai,ˆ
bi)dx=1α
2
Here, B(x|a,b)is the density of a beta distribution, Be(a,b), with parameters aand b.
3 Examples and simulations
As an illustration, age-adjusted tongue cancer mortality rates were calculated for each
of the regions. Tongue cancer occurs mostly among the elders. The 2002 mortality
data for tongue cancer, even though available from the NCHS, were obtained from
the SEER Program of the NCI (see the web site: www.seer.cancer.gov). We carried out
two different simulation studies to evaluate the performance of the proposed gamma,
beta and normal (with lower limits truncated at 0) intervals with the gamma interval
of Fay and Feuer.4In the first simulation study, we took the true means of the Poisson
distributions of Dij to be the observed values of deaths in the (i,j)th cell, where istands
for the 51 regions of the US (50 states and Washington DC) and jstands for the 19
age-groups, to be (i=1, ..., 51; j=1, ...,19). Therefore, the true value of µiis the
observed value of the age-adjusted rate for each i. From the Poisson distributions, we
generated 10 000 values of dij, and obtained the observed values of the age-adjusted
rates Riusing the normalized weights wj, based on the 2000 US standard population,
so that J
j=1wj=1. We computed approximate 95% CIs for µifor each of the 51
regions using the gamma intervals of Fay and Feuer4and the proposed gamma, beta and
normal intervals. Additionally, we compared the Finterval of Fay9for µi(i)with the
proposed Fand normal intervals (with left limits truncated at 0). We compared the age-
adjusted rate of each of the 51 regions with the rest of US age-adjusted rate. Once again,
we chose the year 2002 tongue cancer mortality age-adjusted rates for the 51 regions.
The simulations were carried out assuming the 2000 standard population generating dij
from independent Poisson with mean equal to the observed dij.
Efficient interval estimation for age-adjusted cancer rates 557
Table 1 gives the results of the first simulation study. Columns 2 and 3 of the table
give the observed (true) tongue cancer mortality counts and age-adjusted rates (per
100 000 mid-year population) for the 50 states, the District of Columbia, and the four
Census Bureau Regions (Northwest, Midwest, West, and South). Column 3 presents the
empirical coverage probabilities of the 95% CIs for the (simulated) age-adjusted rates
based on the gamma, modified gamma, beta, and normal approximations. Column 5
shows the observed (true) ratios of age-adjusted rates of each of the 51 regions with the
rest of the US Column 6 gives the empirical coverage probabilities of the 95% CIs for
the (simulated) rate ratios based on Fmodified Fand normal approximations.
Both the modified gamma and modified Fintervals are more efficient than their coun-
terparts because their empirical coverage probabilities are at least 95%, but are lower
than for the gamma and the Fintervals. The beta and normal intervals are slightly liberal
as they do not perform well as they have empirical coverage probabilities less than 95%
for a number of regions.
In the second simulation study, we considered the effect of randomly generated values
of wij and dij on the performance of the gamma, beta and normal intervals. Here, the
subscript idoes not play any role, and is treated as a dummy variable, but it is kept for the
sake of notational consistency. We generated 19 numbers, corresponding to the J=19
age-groups, from the uniform U(0, 1)distribution and standardized them (and called
them wij;j=1, ...,19)so that 19
j=1wij is a very small number, say, equal to 5.0 ×106.
We again generated 19 numbers from U(0, 1)and standardized them (and called them
dij;j=1, ..., 19) so that their sum is small, 19
j=1dij =20 . These standardized numbers
were taken to be the values of the true means λij,j=1, ..., 19.
Then, we simulated 10 000 values of dij from the Poisson distributions with means
λij,j=1, ..., 19. From these, we calculated the age-adjusted rates riand the 95% CIs
for µiusing the gamma and normal intervals. We also calculated the variance of wij.
We repeated the entire process 500 times. Note that we could have standardized the sum
19
j=1wij to any other small number, but we chose it to be 5.0 ×106so that it was
similar to what we have based on the 2000 US standard population and the 2002 age-
adjusted rates. We also could have standardized the sum 19
j=1dij to any other number
than 20 possibly to 50, but we kept it to 20 to see the effect of small sample size; that is,
the small number of total mortality counts for the region, i.
Note that out of 10 000 intervals, corresponding to each one of the 500 replications, it
is expected that approximately 9500 intervals would contain the true mean µiand 500
would not; that is, it is expected that approximately 250 values of the lower limits would
be above the true mean µiand about the same number of the upper limits would be below
the true mean µi. In Figures 1 and 2, we plotted the 500 values of the variance of the
normalized weights wij on the x-axis, and the frequencies of the lower and upper limits of
µifor the Fay and Feuer4intervals, modified gamma, beta and normal intervals that fell,
respectively, above and below the true mean µi, were plotted on the y-axis. In Figure 3,
we plotted both the lower and the upper limits against the variance of wij. Note that
the two solid lines in Figure 3 correspond to the lower and upper 95% confidence limits
for true proportion, p, based on Bin(10 000, 0.05), and then rescaled by multiplying by
10 000; that is 10000(0.05 ±1.960.05 ×0.95/10000)(457, 543). Thus the expected
558 RC Tiwari, LX Clegg and Z Zou
Table 1 Comparisons of empirical coverage probabilities for 95% CIs for the age-adjusted mortality rates
of states/Census Bureau Regions and ratios of these rates to the rest of the US for tongue cancer
State/region True True rate Coverage of 95% CI (rate) Coverage of 95% CI (ratio)
count (per 100 000) Modified True Modified
Gamma gamma Beta Normal ratio FF Normal
Alaska 1 0.25 97.0 97.0 97.0 99.3 0.38 97.0 97.0 99.3
Wyoming 3 0.56 98.8 98.8 96.5 98.9 0.87 98.8 98.8 99.0
Montana 4 0.41 98.0 98.0 95.3 99.2 0.63 98.1 98.1 99.1
Vermont 4 0.58 98.1 98.1 95.0 99.3 0.89 98.3 98.3 99.2
Delaware 5 0.60 98.8 98.8 96.9 96.1 0.92 98.7 98.7 96.1
Rhode Island 5 0.45 98.6 98.6 96.6 95.2 0.69 98.4 98.4 96.1
Washington DC 6 1.06 97.9 96.4 94.1 97.2 1.62 97.9 96.8 97.0
Utah 6 0.34 97.3 96.7 94.8 96.8 0.52 97.7 96.8 96.6
Nebraska 8 0.46 96.8 96.8 95.1 94.9 0.70 96.8 96.7 95.2
South Dakota 8 0.92 97.7 97.1 95.1 96.2 1.42 97.7 97.1 96.2
New Mexico 9 0.48 96.8 96.2 94.4 96.5 0.73 97.0 96.4 96.2
West Virginia 9 0.41 97.5 97.5 95.7 96.4 0.62 97.8 97.6 96.5
North Dakota 10 1.51 97.6 97.6 96.1 95.7 2.32 97.2 97.0 95.7
Hawaii 12 0.89 96.7 96.3 94.9 95.8 1.37 96.8 96.5 95.9
Iowa 12 0.36 96.7 96.2 94.6 95.4 0.56 96.7 96.2 95.5
Idaho 13 1.05 96.5 96.1 94.6 95.6 1.61 96.5 96.0 95.5
Kansas 13 0.46 96.8 96.5 95.0 95.3 0.70 96.7 96.4 95.4
Maine 14 0.93 97.8 97.1 95.9 95.7 1.43 97.5 97.0 95.9
New Hampshire 14 1.10 97.1 96.6 95.1 95.7 1.69 97.0 96.6 95.7
Mississippi 15 0.53 96.4 96.2 95.0 95.0 0.81 96.7 96.4 95.4
South Carolina 16 0.40 96.6 96.4 95.1 95.5 0.61 96.5 96.2 95.4
Colorado 18 0.51 96.7 96.3 95.4 95.3 0.77 96.7 96.3 95.4
Oklahoma 19 0.52 96.5 96.2 95.3 95.0 0.80 96.7 96.2 95.2
Alabama 20 0.43 96.8 96.4 95.1 95.9 0.65 96.8 96.6 95.8
Arkansas 22 0.74 96.7 96.5 95.3 95.7 1.14 96.6 96.3 95.5
Kentucky 22 0.52 96.6 96.4 95.2 95.5 0.80 96.4 96.3 95.5
Louisiana 25 0.58 95.7 95.5 94.4 95.0 0.89 95.8 95.6 94.9
Arizona 26 0.47 96.4 96.1 95.0 95.7 0.72 96.3 96.2 95.5
Nevada 26 1.21 96.4 95.6 94.7 94.9 1.88 96.5 95.6 94.9
Connecticut 27 0.69 96.3 96.0 94.6 95.3 1.06 96.1 95.9 95.3
Oregon 27 0.75 96.2 96.0 95.0 95.2 1.15 96.3 96.1 95.2
Minnesota 31 0.63 96.1 95.9 95.0 95.1 0.96 95.9 95.8 95.0
Missouri 36 0.59 96.0 95.9 94.9 95.2 0.91 96.1 95.9 95.2
Georgia 38 0.52 96.3 95.9 95.2 95.1 0.79 96.3 96.0 95.1
Virginia 38 0.53 96.3 96.1 95.2 95.3 0.81 96.2 96.1 95.3
Massachusetts 39 0.56 96.2 96.0 95.2 95.3 0.85 96.3 96.1 95.3
Maryland 40 0.75 96.1 96.0 95.2 95.5 1.16 96.4 96.2 95.5
Indiana 42 0.67 96.0 95.8 95.0 95.3 1.03 96.0 95.9 95.3
Wisconsin 43 0.75 95.6 95.5 94.5 94.8 1.16 95.6 95.5 94.8
Washington 47 0.80 95.9 95.7 94.9 95.2 1.23 95.9 95.7 95.2
Tennessee 50 0.83 96.0 95.9 94.9 95.2 1.28 96.0 95.9 95.2
North Carolina 53 0.64 96.0 95.9 95.2 95.3 0.99 96.2 96.1 95.4
New Jersey 59 0.65 96.0 95.8 95.2 95.3 1.00 96.1 95.9 95.3
Illinois 65 0.53 95.6 95.6 94.9 95.0 0.80 95.5 95.5 94.9
Ohio 68 0.56 95.9 95.8 95.1 95.2 0.86 96.0 95.9 95.3
Michigan 76 0.75 95.4 95.3 94.5 94.6 1.16 95.4 95.3 94.8
Pennsylvania 86 0.60 95.9 95.8 95.0 95.3 0.91 95.9 95.8 95.2
New York 118 0.59 95.9 95.8 95.3 95.3 0.90 95.7 95.6 95.2
Texas 140 0.76 95.8 95.7 95.2 95.2 1.19 95.5 95.4 95.2
Florida 145 0.70 96.1 96.0 95.5 95.5 1.09 95.8 95.7 95.3
California 254 0.81 95.7 95.6 95.3 95.3 1.28 95.8 95.7 95.4
Northeast 366 0.62 95.7 95.6 95.3 95.2 0.94 95.7 95.7 95.2
Midwest 412 0.61 95.6 95.5 95.2 95.2 0.93 95.2 95.2 94.8
West 446 0.74 95.6 95.6 95.4 95.3 1.18 95.7 95.6 95.4
South 663 0.64 95.0 95.0 94.8 94.9 0.98 95.2 95.1 95.0
Efficient interval estimation for age-adjusted cancer rates 559
Figure 1 Number of upper limits below true mean (over 10 000 replications).
Figure 2 Number of lower limits above true mean (over 10 000 replications).
560 RC Tiwari, LX Clegg and Z Zou
Figure 3 Number of CIs not covering true mean (over 10 000 replications).
numbers of the lower and upper limits of µithat fall above and below the true mean
are between 457 and 543. In Figure 4, we plotted the lengths of the simulated intervals
against the variance of wij.
From Figures 1–4, we observe that the modified gamma intervals have empirical cov-
erage at least 95%, but slightly lower than the gamma intervals of Fay and Feuer,4the
beta and normal intervals (with lower limits truncated at 0 if they were negative) also
have empirical coverage probabilities very close to 95%, and their widths are lower than
the gamma intervals. The coverage probabilities of the upper limits of both the beta and
modified gamma intervals are identical and at least 97.5%, but slightly lower than the
gamma intervals of Fay and Feuer.4The lower limits of the normal intervals are slightly
more conservative than those for gamma, while the upper limits of the normal inter-
vals are least conservative. The advantage of using modified gamma intervals over the
gamma intervals is clear from Figure 3, wherein the gamma intervals show a coverage
probability of around 97% as the variance wij increases, the modified intervals show the
coverage probability staying slightly higher than 95%. Overall, from these simulation
studies, the gamma intervals of Fay and Feuer4are more conservative than the proposed
gamma. The beta intervals are slightly more liberal than both the modified gamma and
the gamma intervals of Fay and Feuer.4The normal intervals are more liberal than the
beta intervals.
In simulations, when the Poisson means were 0, as the observed dij were 0, we set the
simulated values of Dij to be equal to 0. This is because Dij are non-negative random
variables with the means and variances equal and if the mean of a Dij is 0 then that Dij
Efficient interval estimation for age-adjusted cancer rates 561
Figure 4 Length of CIs.
is 0 with probability 1. Of course, when Dij have positive means, there is a good chance
that the simulations could still result in 0 for the simulated values of Dij. We considered
another simulation study where we took the Dij to be Poisson with means nijra,j, with
ra,j=I
i=1dij/njas the observed 2002 US age-specific mortality rates for tongue cancer.
Note that in this case, µi=J
j=1wij(nij ra,j)is a constant, independent of both iand j,
and the ratio of the means of two age-adjusted rates is 1. The results of this study were
very similar to those given in Table 1.
Next, we studied the performance of the 95% normal intervals for the ratios µi
and the differences µiµ, and their coverage probabilities were close to 0.95. As an
illustration, Figure 5 gives the plots of the number of CIs that do not contain the ratio
of the observed age-adjusted mortality rates for Arkansas to the US, for the normal
intervals, with lower limits truncated at 0 and with lni/µ) transformation. For com-
parison, we also plotted these numbers for both the Fand modified Fintervals, ignoring
the dependence of Rion R. The Figure 5 shows that the Fintervals are very conservative,
the modified Fintervals and the normal intervals based on the logarithmic transforma-
tion have coverage probabilities close to 0.95, and the normal intervals with lower limits
truncated at 0 are slightly liberal. Of course, both the Fand modified Fintervals do not
incorporate the crucial assumption of the dependence between Riand R, and may not
be appropriate in this context.
We also applied the normal CIs for µiµ, to compare if the 2002 esophagus age-
adjusted mortality rates for each of the 51 regions were equal to or not to the US
562 RC Tiwari, LX Clegg and Z Zou
Figure 5 Number of CIs not covering true ratio of 2002 age-adjusted mortality rates for Arkansas to US for
tongue cancer (over 10 000 replications).
age-adjusted rate using the 2002 esophagus mortality data. We found that the age-
adjusted rates for Ohio and Pennsylvania were different from the US when we applied
the normal CIs for µiµwith correlated Riand R, as the CIs did not contain 0; whereas
when we applied the CIs for µiµbased on the uncorrelated Riand R, the age-adjusted
rates for the two states were equal to that of the US as the CIs contained 0. For the other
49 regions, the two CIs produced results that were in agreement.
4 Discussion
The advantage of the modified gamma and Fintervals is that they depend on all wij
rather than just the largest value wi(J).TheFintervals are based on the ratio of two
chi-squared distributions that are independent and, unlike Fay,9do not depend on the
restrictions dij +dij=tjfor all j. Also, the advantage of using the estimates of ˜µi,˜σ2
i
and ˜ρi, based on the continuity correction factor, over their counterparts ˆµi,ˆσ2
iand ˆρi,
is more for the rare cancer sites. Without the adjustment for the continuity correction,
the normal and beta CIs for µi, for the tongue cancer site, were observed to be liberal,
especially for the regions with small mortality counts.
In Figures 1–4, we reported the performance of the gamma, modified gamma, beta
and normal (with lower limits truncated at 0)intervals. We also studied, but did not
report, the performance of the ABC, DKES, and normal intervals for µibased on the
Efficient interval estimation for age-adjusted cancer rates 563
transformations ln(ln(Ri)) and ln(Ri/(1Ri)). We observed that both the gamma
and modified gamma intervals always retained the nominal coverage of at least 0.95,
with the modified gamma intervals being less conservative than the gamma intervals.
None of the other intervals retained the nominal coverage. The DKES intervals were
next with the empirical coverage probabilities closer to the nominal value of 0.95, and
then the beta intervals, the ABC intervals, the normal (with lower limits truncated at
0)intervals, the normal intervals based on ln(ln(Ri)), and the normal intervals based
on ln(Ri/(1Ri)), in that order. Similarly, for the CIs for the ratios of the means of
two (uncorrelated) age-adjusted rates, both the Fintervals of Fay9and the modified F
intervals retained the nominal coverage of at least 0.95, with the modified Fbeing less
conservative of the two. The normal intervals (with lower limits truncated at 0)have
coverage probabilities very close to 0.95 followed by the normal intervals based on the
transformation ln(Ri/R(i)).
We may mention that the beta intervals can be viewed as approximation to Bayesian
credible intervals for µi. Assume that 0
ij <1 are small so that Dij ind Bin(nij,λij).
Further assume that λij are independent with prior π(λij)1, 0
ij <1. Then the
posterior distributions are given by
λij|nij ,rij ind Be(nij rij +1, nij(1rij)+1)Be(nijrij,nij(1rij ))
and we can approximate the posterior means and variances of µi=J
j=1wijλij by ˜riand
˜νi. Now, the credible intervals can be obtained as follows. Generate G(large) Markov
chain Monte Carlo (MCMC) values on λ(g)
ij ,g=1, ...,G, using Gibbs sampler, from
the posterior distributions of λij, and compute the Gvalues of µi, namely, µ(g)
i=
G
g=1wijλ(g)
ij , and then construct the 100(1α)% credible interval for µifrom the
empirical distribution of {µ(g)
i}, by ordering these values from the smallest to the largest
and taking the credible interval to be the 100(α/2)th and 100(1α/2)th ordered values.
We performed MCMC simulations and constructed the credible intervals for the 2002
age-adjusted mortality rates for the tongue cancer for the 51 regions of the US and found
that the credible intervals were more liberal than the beta intervals in Table1.
The assumption that the mortality or incidence counts are independent Poisson is used
by many, for example, see Brillinger,3and is perhaps a consequence of the underlying
birth/death (continuous) Poisson process model. We have not seen any analyses for the
age-adjusted rates for the case of correlated Dij. However, as pointed out by a referee,
it is quite possible for neighboring states to have common socio-economic and other
factors resulting in correlated Dijs. This is an important topic for future research.
5 Conclusion
We presented CIs for the means of the cancer age-adjusted rates for the 51 regions, µi,
for the US µ, for the ratios of the means µii,µi(i),µi and for the differences
µiµ. We developed modifications of the gamma interval of Fay and Feuer,4and
564 RC Tiwari, LX Clegg and Z Zou
the Finterval of Fay,9and proposed new CIs based on the beta and normal intervals.
Simulations were carried out to compare the performance of these intervals in terms of
their empirical coverage probabilities, and results showed that the modified gamma and
Fintervals performed better than the gamma interval of Fay and Feuer4and the Finterval
of Fay9in terms of retaining the nominal coverage. The other intervals such as the DKES,
ABC, beta, and the truncated normal intervals were shown to be good competitors. The
modified gamma and Fintervals are going to replace the gamma and Fintervals in the
SEER Program. In addition, for comparing µiand µ, the normal intervals for µiµ
that incorporate the correlation between Riand Rare also recommended to replace the
ones that are based on the uncorrelated Riand Rin the SEER Program.10 Even though
the results of this paper are presented in the context of constructing the CIs for the (true)
age-adjusted mortality rates based on data from the SEER Program, they can be applied
to similar data from other countries as well.
Acknowledgements
The authors would like to thank the editor and the two referees for their valuable
comments that led to a significant improvement of the original manuscript.
References
1 Jemal A, Murray T, Ward E, Samuels A,
Tiwari RC, Ghafoor A, Feuer EJ, Thun MJ.
Cancer Statistics 2005. CA A Cancer Journal
for Clinicians 2005; 55: 1–22.
2 American Cancer Society, Cancer Facts &
Figures 2005.
3 Brillinger DR. The natural variability of vital
rates and associated statistics (with
discussion), Biometrics 1986; 42: 693–734.
4 Fay MP, Feuer EJ. Confidence intervals for
directly standardized rates: a method based
on the gamma distribution. Statistics in
Medicine 1997; 16: 791–801.
5 DiCiccio T, Efron B. More accurate
confidence intervals in exponential families.
Biometrika 1992; 79: 231–45.
6 Efron B, Tibshirani RJ. An introduction to
the bootstrap. Chapman & Hall, 1993.
7 Swift MB. Simple confidence intervals for
standardized rates based on the approximate
bootstrap method. Statistics in Medicine
1995; 14: 1875–88.
8 Dobson AJ, Kuulasmaa K, Ederle E,
Scherer J. Confidence intervals for weighted
sums of Poisson parameters. Statistics in
Medicine 1991; 10: 457–62.
9 Fay MP. Approximate confidence intervals for
rate ratios from directly standardized rates
with sparse data. Communications in
Statistics Theory and Methods 1999; 28:
2141–60.
10 Ries LAG, Eisner MP, Kosary CL, Hankey BF,
Miller BA, Clegg LX, Edwards BK. SEER
Cancer Statistics Review, 1973–1997.
National Cancer Institute (NIH Pub. No.
00-2789), 2000.
11 Johnson NL, Kotz S. Continuous univariate
distributions I. Wiley, 1969.
12 Bickel PJ, Doksum KA. Mathematical
statistics: basic ideas and selected topics,
Holden-Day, Inc., 1977.
13 Breslow NE, Day NE. Statistical methods in
cancer research, volume II the design and
analysis of cohort studies. Oxford University
Press, 1987.
14 Casella G, Berger RL. Statistical inference.
Wadsworth & Brooks/Cole Advanced Books
& Software, 1990.
Efficient interval estimation for age-adjusted cancer rates 565
Appendix A: Means and variances of Ri,R(i)and R, and of their
ratios, and the covariance between Riand R
We can rewrite Ri,R(i)and Ras
Ri=1
n
J
j=1
wjDij
ξjξij ;R(i)=1
n
J
j=1
wjD(i)j
ξjξ(i)j;R=1
n
J
j=1
wjDj
ξj
Let
σ2
i=
J
j=1
w2
j
λij
ξjξij ;σ2
(i)=
J
j=1
w2
jI
i=iξijλij
ξjξ2
(i)j;
σ2=
J
j=1
w2
jI
i=1ξijλij
ξj;ρi=
J
j=1
w2
j
λij
ξj
Then,
µiE(Ri)=
J
j=1
wjλij;µ(i)E(R(i))=
J
j=1
wjI
i=iξijλij
ξ(i)j;
µE(R)=
J
j=1
wjI
i=1
ξijλij
viVar (Ri)=σ2
i
n;ν(i)=σ2
(i)
n;
vVar (R)=σ2
n;Cov(Ri,R)=ρi
n
Using the delta-method, the means and variances of the ratios Ri/Ri,Ri/R(i)and Ri/R
are given by
ERi
Riµi
µi;ERi
R(i)µi
µ(i)
;ERi
Rµi
µ
Var Ri
Riσ2
iµ2
i+σ2µ2
i
nµ4
i
;Var
Ri
R(i)σ2
iµ2
(i)+σ2µ2
i
nµ4
(i)
Var Ri
Rσ2
iµ2+σ2µ2
i2ρiµiµ
nµ4
566 RC Tiwari, LX Clegg and Z Zou
Appendix B: ABC and DKES intervals
The ABC intervals are4
LABCi;α) µi+z0izα/2
{1ai[z0izα/2]}2ˆσi
n
UABCi;α) µi+z0i+zα/2
{1ai[z0i+zα/2]}2ˆσi
n
where zα/2=1(1α/2)is the upper α/2th percentile point of the standard normal
distribution function, ,ai=z0i=(J
j=1w3
ijdij )/(6ˆν3/2
i). The DKES intervals are4
LDKESi;α) µi+ˆσi
nJ
j=1dij
1
2χ2
2(J
j=1dij)1α
2
J
j=1
dij
UDKESi;α) µi+ˆσi
nJ
j=1dij
1
2χ2
2(1+J
j=1dij)11α
2
J
j=1
dij
Appendix C: Asymptotic normality and confidence intervals based
on Rij
LetR=(R11,...,R1J,...,RI1,...,RIJ)T,¯
R=(R1,...,RI,R)T,µ=1,...,µI,µ)T
and let =((σij)) be (I+1)×(I+1)matrix with σii =σ2
i,σi,I+1=σI+1,i=ρiand
σii=0fori= i. Here the superscript T denotes the transpose. Since Rcan be expressed
as ¯
R=ARfor an appropriately defined matrix A,wehave
n(¯
Rµ) −→ N(I+1)(0,)
where =A[Cov(R)]ATand Np(b,B)denotes a p-dimensional multivariate normal
distribution.
Thus for any non-null (I+1)-column vector a,
naT(¯
Rµ) −→ N(0, aTa)
Efficient interval estimation for age-adjusted cancer rates 567
In particular, by choosing aappropriately, we have
Ri=
J
j=1
wjRij ind AN µi,σ2
i
n
R(i)=
J
j=1
wjI
i=iξijRij
ξ(i)jAN µ(i),σ2
(i)
n
R=
J
j=1
wjI
i=1
ξijRij AN µ,σ2
n
Ri
RiAN µi
µi,σ2
iµ2
i+σ2
iµ2
i
nµ4
i;Ri
R(i)AN µi
µ(i)
,σ2
iµ2
(i)+σ2
(i)µ2
i
nµ4
(i)
Ri
RAN µi
µ,σ2
iµ2+σ2µ2
i2ρiµiµ
nµ4
(RiR)AN µiµ,σ2
i+σ22ρi
n
µi=!ˆµi±zα/2ˆσi
n"0; µ=!ˆµ±zα/2ˆσ
n"0
µi
µi=
ˆµi
ˆµi±zα/2(ˆσ2
iˆµ2
iσ2
iˆµ2
i)
nˆµ4
i
0
µi
µ=
ˆµi
ˆµ±zα/2(ˆσ2
iˆµ2σ2ˆµ2
i2ˆρiˆµiˆµ)
nˆµ4
0
µi
µ(i)=
ˆµi
ˆµ(i)±zα/2(ˆσ2
iˆµ2
(i)σ2
(i)ˆµ2
i)
nˆµ4
(i)
0
µiµµi−ˆµ±zα/2ˆσ2
iσ22ˆρi
n
where ab=max(a,b).
568 RC Tiwari, LX Clegg and Z Zou
Since 0 Ri1 and 0 Ri/R(i)≤∞with probability 1, the following transforma-
tions are commonly used to transform the range of these random variables to (−∞,)
and their results on the asymptotic normality yield:
ln(ln Ri)AN ln(lni)),σ2
i
niln µi)2
log it(Ri)ln Ri
1RiAN ln µi
1µi,σ2
i
ni(1µi))2
ln Ri
R(i)AN ln µi
µ(i),1
n#σ2
i
µ2
i+σ2
(i)
µ2
(i)$
Based on these transformations, the CIs for µi,µi(i)and µi are given as follows:
I)
µi=exp !exp %ln(ln(ˆµi)) ±zα/2ˆσi
(ˆµiln ˆµi)n&"
II)
µi=%1+exp !%ln ˆµi
1−ˆµi±zα/2ˆσi
(ˆµi(1−ˆµi))n&"&1
III)
µi
µ(i)=exp
ln ˆµi
ˆµ(i)±zα/2#1
n#ˆσ2
i
ˆµ2
i+ˆσ2
(i)
ˆµ2
(i)$$1/2
IV)
µi
µ=exp
ln ˆµi
ˆµ±zα/2ˆµ
ˆµi#1
nˆσ2
iˆµ2σ2ˆµ2
i2ˆρiˆµˆµi
ˆµ4$1/2
The CIs in III) above, were also derived by Breslow and Day.13 Note that we will use ˜µi,
˜νiand ˜ρiinstead of ˆµi,ˆνiand ˆρi.
Efficient interval estimation for age-adjusted cancer rates 569
Appendix D: Beta approximations of Rij and Ri
Using the relation that14
x
k=0n
kpk(1p)nk=(n+1)
(x+1)(nx)1p
0tnx1(1t)xdt
=1p
0B(t|nx,x+1)dt
=1
pB(t|x+1, nx)dt
It then follows that
P(Rij rij|(Dij +¯
Dij)=nij ,λij )=λij
0B(t|nijrij +1, nij (1rij )) dt
Another heuristic argument for the beta approximation for Rij is based on the gamma
or chi-squared approximation of a Poisson distribution. Let χ2
kand αχ2
kdenote a chi-
squared random variable with kdegrees of freedom and a re-scaled (by a factor α>0)
χ2
krandom variable. Note that χ2
k=dG(k/2, 1), and if χ2
rand χ2
sare independent,
χ2
r/(χ2
r+χ2
s)=dχ2
r/(χ2
r+s)Be(r/2, s/2), and that χ2
r/(χ2
r+χ2
s)and χ2
r+χ2
sare
independent with χ2
r+χ2
s=dχ2
r+s.
Since Dij and ¯
Dij are independent, distributed as Po(nijλij)and Po(nij(1λij)),
respectively, and their distributions can be approximated by independent chi-squared
distributions 1/2χ2
2([nijrij ]+1)and 1/2χ2
2(nij−[nij rij ]), where [x]denotes the integer value of
x,wehave
Dij
Dij +¯
Dij 1/2χ2
2([nijrij ]+1)
1/2χ2
2([nijrij ]+1)+1/2χ2
2(nij−[nij rij ])
=χ2
2([nijrij ]+1)
χ2
2([nijrij ]+1)+χ2
2(nij−[nij rij ])Be([nijrij]+1, nij −[nij rij]).
Thus, Rij Be([nijrij]+1, nij −[nijrij ]). We can now approximate the distribution of
Ri=J
j=1wjRij by a beta distribution, Be(ˆai,ˆ
bi), where
ˆairi˜ri(1−˜ri)
˜νi1,ˆ
bi=(1−˜ri)˜ri(1−˜ri)
˜νi1
... 30 We obtained 95% confidence intervals using the Tiwari modification. 31 In our primary analyses, we used the SEER-estimated ageadjusted incidence rates, and their standard errors, within subgroups of individuals simultaneously defined by the categories of the factors of interest: US region, race/ethnicity, sex, distant vs localized or regional summary stage, and year at diagnosis. Some of these combinations resulted in small enough groups that fewer than 16 incident CRC cases were observed. ...
Article
Full-text available
Background Colorectal cancer (CRC) incidence rates have been decreasing in the United States (US), but there is limited information about differences in these improvements among individuals from different racial and ethnic subgroups across different regions of the US. Methods Data from the National Program of Cancer Registries (NPCR) and the Surveillance, Epidemiology, and End Results (SEER) databases were used to examine trends in CRC incidence from 2001 to 2020 using a population-based retrospective cohort study. We obtained annual estimates of CRC incidence and used meta-regression analyses via weighted linear models to identify main effects and interactions that explained differences in CRC incidence trends among groups defined by race/ethnicity and US region while also considering CRC stage and sex. To summarize overall trends over time in incidence rates for specific racial and ethnic groups within and across US regions, we obtained average annual percentage change (AAPC) estimates. Results The greatest differences in CRC incidence trends were among groups defined by race/ethnicity and US region. Non-Hispanic Black (NHB) persons had the largest declines in CRC incidence, with AAPC estimates ranging from −2.27 (95% CI: −2.49 to −2.06) in the South to −3.03 (95% CI: −3.59 to −2.47) in the West, but had higher-than-average incidence rates at study end. The AAPC estimate for American Indian/Alaska Native (AIAN) persons suggested no significant change over time (AAPC: −0.41, 95% CI: −2.51 to 1.73). Conclusion CRC incidence trends differ among racial/ethnic groups residing in different US regions. Notably, CRC incidence rates have not changed noticeably for AIAN persons from 2001-2020. These findings highlight the importance of reinvigorating collaborative efforts to develop geographic and population-specific screening and preventative approaches to reduce the CRC burden experienced by Native American communities and members of other minoritized groups.
... High enclave + low nSES was selected as the reference group as each of these categories had the lowest incidence rate of invasive breast cancer for AANHPI females. We used the NCI's SEER*Stat software (version 8.4.2) to compute AAIR per 100,000 population and IRR and 95% confidence intervals (Tiwari modification) [20,21]. Subgroup analyses were conducted for localized and advanced disease. ...
Article
Full-text available
Few studies have examined whether the incidence rates of invasive breast cancer among Asian American, Native Hawaiian, and Pacific Islander (AANHPI) populations differ by the neighborhood social environment. Thus, we examined associations of ethnic enclave and neighborhood socioeconomic status (nSES) with breast cancer incidence rates among AANHPI females in California. A total of 14,738 AANHPI females diagnosed with invasive breast cancer in 2008–2012 were identified from the California Cancer Registry. AANHPI ethnic enclaves (culturally distinct neighborhoods) and nSES were assessed at the census tract level using 2007–2011 American Community Survey data. Breast cancer age-adjusted incidence rates and incidence rate ratios (IRRs) were estimated for AANHPI ethnic enclave, nSES, and their joint effects. Subgroup analyses were conducted by stage of disease. The incidence rate of breast cancer among AANHPI females living in lowest ethnic enclave neighborhoods (quintile (Q)1) were 1.21 times (95% Confidence Interval (CI) 1.11, 1.32) that of AANHPI females living highest ethnic enclave neighborhoods (Q5). In addition, AANHPI females living in highest vs. lowest SES neighborhoods had higher incidence rates of breast cancer (Q5 vs. Q1 IRR = 1.30, 95% CI 1.22 to 1.40). The incidence rate of breast cancer among AANHPI females living in low ethnic enclave + high SES neighborhoods was 1.32 times (95% CI 1.25, 1.39) that of AANHPI females living in high ethnic enclave + low SES neighborhoods. Similar patterns of associations were observed for localized and advanced stage disease. For AANHPI females in California, incidence rates of breast cancer differed by nSES, ethnic enclave, when considered independently and jointly. Future studies should examine whether the impact of these neighborhood-level factors on breast cancer incidence rates differ across specific AANHPI ethnic groups and investigate the pathways through which they contribute to breast cancer incidence.
... Ratios, with 95% confidence intervals (CIs), were calculated using the Tiwari 2006 revision in SEER*Stat 8.4.2 (NCI). 20 To assess trends in CMM incidence rates among elderly individuals in the United States from 1987 to 2016, this study used the Joinpoint Regression Model. For population-based trends in cancer incidence and mortality rates, the logarithmic linear model is typically used. ...
Article
Full-text available
Background and objectives The prevalence and fatality rates of cutaneous malignant melanoma (CMM) have been rising, particularly among the elderly. This study analyzes CMM incidence trends in the United States elderly population from 1987 to 2016 to inform prevention and management strategies. Methods Using incidence data from the Surveillance, Epidemiology, and End Results database spanning 1989 to 2008, we calculated the age-adjusted standardized population incidence rates for CMM in elderly individuals. The Joinpoint software was employed to estimate annual percent change and analyze trends in CMM incidence among elderly individuals from 1987 to 2016. Results The study included 56,997 elderly CMM patients from eight Surveillance, Epidemiology, and End Results registries, of whom 36,726 were male (64.4%). The age-adjusted CMM incidence rate from 2012 to 2016 was 0.99 per 1,000, a 2.8-fold increase from 1987–1991 (95% confidence interval: 2.7–2.9). Incidence rates increased with age and birth cohort, peaking at 1.53 per 1,000 males and 0.59 per 1,000 females aged 85+ during 2012–2016. Birth cohort effects also showed a continuous increase. Conclusions This study reveals a substantial increase in CMM incidence rates among the elderly from 1987 to 2016, particularly between 2012 and 2016. Incidence rates escalated with age and birth cohort, with the highest rates observed in individuals aged 85 and older.
... Confidence intervals for standardized incidence measures were constructed using the Tiwari modified gamma method, which is recommended when performing direct standardization on sparse data. 39,40 Multivariable negative binomial regression was used to evaluate the associations of year, province/territory, age group, and sex with the incidence rate of OHCA patients admitted to the hospital. This modeling approach was selected over simple Poisson regression due to the significant overdispersion of count data, as confirmed through a Lagrange Multiplier test (X 2 (1)=16.04, ...
Article
Full-text available
Decreases in lung cancer incidence in the United States (US) have paralleled decreasing smoking prevalence for several decades; however, recent data has revealed slower declines among females than males. Sex‐based differences in histologic lung cancer—and specifically adenocarcinoma—for all 50 US states and the District of Columbia have never been investigated. Using population‐based cancer registry data from the US Cancer Statistics, we examined age‐adjusted histologic lung cancer incidence rates and trends by sex and state of residence at diagnosis. We compared state‐level adenocarcinoma incidence to lung cancer screening (LCS) adherence and smoking prevalence estimates. Average annual percentage change (AAPC) and incidence rate ratios (IRR) were used to assess changes over time. Nationally, females experienced faster increases in adenocarcinoma incidence than males (1.75%/year vs. 0.35%/year), and slower decreases in incidence of squamous cell (−0.06%/year vs. −1.58%/year) and small cell carcinoma (−2.06%/year vs. −3.19%/year). Adenocarcinoma incidence increased significantly (AAPC>0) in 41 states among females compared to 10 among males. Significant adenocarcinoma increases in individuals under age 55 (IRR >1) occurred among females in six states (four in the southeastern US) and none among males. State‐level LCS adherence was significantly associated with adenocarcinoma incidence among females (r = 0.39; p<.01) but not males, though screening cannot account for increases among females under age 55. Our results highlight sex‐based differences in histologic lung cancer incidence trends, with specific concern for increases in adenocarcinoma in the southeastern US. Further research is needed into appropriate LCS eligibility criteria and the risk factors driving sex‐based disparities.
Article
Importance The COVID-19 pandemic impacted the timely diagnosis of cancer, which persisted as the second leading cause of death in the US throughout the pandemic. Objective To evaluate the disruption and potential recovery in cancer detection during the first (2020) and second (2021) years of the COVID-19 pandemic. Design, Setting, and Participants This cross-sectional study involved an epidemiologic analysis of nationally representative, population-based cancer incidence data from the Surveillance, Epidemiology, and End Results (SEER) Program. Included patients were diagnosed with incident cancer from January 1, 2000, through December 31, 2021. The analysis was conducted in May 2024 using the April 2024 SEER data release, which includes incidence data through December 31, 2021. Exposures Diagnosis of cancer during the first 2 years of the COVID-19 pandemic (2020, 2021). Main Outcomes and Measures Difference between the expected and observed cancer incidence in 2020 compared with 2021, with additional analyses by demographic subgroups (sex, race and ethnicity, and age group) and community (county-level) characteristics. Results The analysis included 15 831 912 patients diagnosed with invasive cancer between 2000 and 2021, including 759 810 patients in 2020 and 825 645 in 2021. The median age was 65 years (IQR, 56-75 years), and 51.0% were male. The percentage difference between the expected and observed cancer incidence was −8.6% (95% CI, −9.1% to −8.1%) in 2020, with no significant difference in 2021 (−0.2%; 95% CI, −0.7% to 0.4%). These translated to a cumulative (2020-2021) deficit in observed vs expected cases of −127 931 (95% CI, −139 206 to −116 655). Subgroup analyses revealed that incidence rates remained substantially depressed from expected rates into 2021 for patients living in the most rural counties (−4.9%; 95% CI, −6.7% to −3.1%). The cancer sites with the largest cumulative deficit in observed vs expected cases included lung and bronchus (−24 940 cases; 95% CI, −28 936 to −20 944 cases), prostate (−14 104 cases; 95% CI, −27 472 to −736 cases), and melanoma (−10 274 cases; 95% CI, −12 825 to −7724 cases). Conclusions and Relevance This cross-sectional study of nationally representative registry data found that cancer incidence recovered meaningfully in 2021 following substantial disruptions in 2020. However, incidence rates need to recover further to address the substantial number of patients with undiagnosed cancer during the pandemic.
Article
Full-text available
Previous studies reported higher lung cancer incidence in women than men among persons aged 35–54 years in the United States, a reversal of historically higher rates in men. We examined whether this pattern varies by state. Based on lung cancer incidence (2015–2019) data among adults aged 35–54 years from Cancer in North America database and historical cigarette smoking prevalence data (2004–2005) among adults 20–39 years from the Behavioral Risk Factor Surveillance System, incidence rates in women were equal to or higher than rates in their male counterparts in 40 of 51 states, with statistically significant differences in 20 states (two‐sided, p < .05). In contrast, current and ever smoking prevalence in women compared to men was statistically significantly lower (33 and 34 states, respectively) or similar. Furthermore, there was no association between differences in historical smoking prevalence and lung cancer incidence by sex. Lung cancer incidence rate is higher in young women than young men in most states and is unexplained by differences in smoking prevalence.
Article
Background Meningioma risk factors include older age, female sex, and African-American race. Limited data explore how meningioma risk in African-Americans varies across the lifespan, interacts with sex, and differs by tumor grade. Methods The Central Brain Tumor Registry of the United States (CBTRUS) is a population-based registry covering the entire U.S. population. Meningioma diagnoses from 2004-2019 were used to calculate incidence rate ratios (IRRs) for non-Hispanic Black individuals (NHB) compared to non-Hispanic white individuals (NHW) across 10-year age intervals, and stratified by sex and by WHO tumor grade in this retrospective study. Results 53,890 NHB individuals and 322,373 NHW individuals with an intracranial meningioma diagnosis were included in analyses. Beginning in young adulthood, the NHB-to-NHW IRR was elevated for both grade 1 and grade 2/3 tumors. The IRR peaked in the seventh decade of life regardless of grade, and was higher for grade 2/3 tumors (IRR = 1.57; 95% CI: 1.46-1.69) than grade 1 tumors (IRR = 1.27; 95% CI: 1.25-1.30) in this age group. The NHB-to-NHW IRR was elevated in females (IRR = 1.17; 95% CI: 1.16-1.18) and was further elevated in males (IRR = 1.28; 95% CI: 1.26-1.30), revealing synergistic interaction between NHB race/ethnicity and male sex (PInteraction=0.001). Conclusions Relative to NHW individuals, NHB individuals are at elevated risk of meningioma from young adulthood through old age. NHB race/ethnicity conferred greater risk of meningioma among men than women, and greater risk of grade 2/3 tumors. Population-level differences in meningioma incidence and tumor behavior suggest potential disparities in the geographic, socioeconomic, and racial distribution of meningioma risk factors within the U.S.
Article
The Central Brain Tumor Registry of the United States (CBTRUS), in collaboration with the Centers for Disease Control and Prevention and the National Cancer Institute, is the largest population-based registry focused exclusively on primary brain and other central nervous system (CNS) tumors in the United States (US) and represents the entire US population. This report contains the most up-to-date population-based data on primary brain tumors available and supersedes all previous reports in terms of completeness and accuracy. All rates are age-adjusted using the 2000 US standard population and presented per 100,000 population. Between 2017 and 2021, the average annual age-adjusted incidence rate (AAAIR) of all primary malignant and non-malignant brain and other CNS tumors was 25.34 per 100,000 population (malignant AAAIR=6.89 and non-malignant AAAIR=18.46). This overall rate was higher in females compared to males (28.77 versus 21.78 per 100,000) and non-Hispanic Black persons compared to persons who were non-Hispanic White (26.60 versus 25.72 per 100,000), non-Hispanic American Indian/Alaska Native (23.48 per 100,000), non-Hispanic Asian or Pacific Islander (19.86 per 100,000), and Hispanic persons of all races (22.37 per 100,000). Gliomas accounted for 22.9% of all tumors. The most commonly occurring malignant brain and other CNS histopathology was glioblastoma (13.9% of all tumors and 51.5% of all malignant tumors), and the most common predominantly non-malignant histopathology was meningioma (41.7% of all tumors and 56.8% of all non-malignant tumors). Glioblastomas were more common in males, and meningiomas were more common in females. In children and adolescents (ages 0-19 years), the incidence rate of all primary brain and other CNS tumors was 6.02 per 100,000 population. There were 87,053 deaths attributed to malignant brain and other CNS tumors between 2017 and 2021. This represents an average annual mortality rate of 4.41 per 100,000 population and an average of 17,411 deaths per year. The five-year relative survival rate following diagnosis of a malignant brain or other CNS tumor was 35.7%. For a non-malignant brain or other CNS tumor the five-year relative survival rate was 92.0%.
Article
Full-text available
We consider a class of exact confidence intervals for rate ratio (or relative risk) estimators calculated with directly standardized rates (DSRs) assuming a multiplicative Poisson model which we call "exact DSR intervals". These intrvals relate more closely to some standard rate ratio estimators than standard exact intervals. yet they are morc tlificult to calculnte. We introduce an approximation to estimatc these intervals using the inverse F distribution. We show that the approximation is equivalent to both types of exact intervals when the standard is proportional to both populations and is asymptotically equivalent to the log transformed nor111a1 method as the counts and personyears increase at the same rate. We compare our approximation to the exact DSR int,ervals and show by sin~ulating some cases with non-proportiollwl popl~ lations that the approximation is usually slightly more conservative than the exact DSR intervals. Comparing other known approximation in the same manner. we show that these other intervals are often more liberal than the exact DSR intervals.
Article
Full-text available
The first concern of this work is the development of approximations to the distributions of crude mortality rates, age-specific mortality rates, age-standardized rates, standardized mortality ratios, and the like for the case of a closed population or period study. It is found that assuming Poisson birthtimes and independent lifetimes implies that the number of deaths and the corresponding midyear population have a bivariate Poisson distribution. The Lexis diagram is seen to make direct use of the result. It is suggested that in a variety of cases, it will be satisfactory to approximate the distribution of the number of deaths given the population size, by a Poisson with mean proportional to the population size. It is further suggested that situations in which explanatory variables are present may be modelled via a doubly stochastic Poisson distribution for the number of deaths, with mean proportional to the population size and an exponential function of a linear combination of the explanatories. Such a model is fit to mortality data for Canadian females classified by age and year. A dynamic variant of the model is further fit to the time series of total female deaths alone by year. The models with extra-Poisson variation are found to lead to substantially improved fits.