Page 1

Design and Evaluation of Prophylactic

Interventions Using Infectious Disease

Incidence Data from Close Contact Groups

Yang Yang1, Ira M. Longini, Jr.1, M. Elizabeth Halloran1

Technical Report 04-09

July 22, 2004

Department of Biostatistics

Rollins School of Public Health

Emory University

Atlanta, Georgia 30322

Department of Biostatistics, Rollins School of Public Health,

Emory University, 1518 Clifton Road NE, Atlanta, GA 30322 USA

Telephone: (404)727-9169FAX: (404)727-1370

e-mail: yyang3@sph.emory.edu

Page 2

Summary. Prophylaxis of contacts of infectious cases such as household

members and treatment of infectious cases are methods to prevent spread

of infectious diseases. We develop study designs and statistical methods for

estimating the efficacy of such interventions in reducing susceptibility and

infectiousness as well as for estimating the transmission probabilities. We

consider both the design with prospective follow up of close contact groups

and the design with ascertainment of close contact groups by an index case.

Randomization by groups and by individuals are compared. We develop two

methods for estimating the efficacy and transmission probabilities for each

design. The first uses maximum likelihood, and the second uses a generalized

linear models framework estimated by iteratively re-weighted least squares

with the EM algorithm. We develop a method to deal with the left truncation

of the case-ascertained follow up design. We use these methods to compare

the designs using simulations and to analyze data from a trial of an antiviral

agent in preventing influenza in household contacts.

Key words: Infectious disease; Intervention efficacy; Community trial;

Antiviral agent; Left truncation; Linear model

1.Introduction

Transmission of many infectious diseases takes place mainly through close

contacts in mixing groups such as households, daycare centers, schools, and

the workplace, and to a lesser extent through casual contacts in the commu-

nity at large. Data from clinical studies based on close contact groups offer a

basis for estimating person-to-person and community-to-person transmission

1

Page 3

probabilities and, more importantly, for evaluating the effectiveness of pro-

phylactic interventions such as vaccination and antiviral agents. Estimation

of the person-to-person transmission probabilities within the close contact

groups is conditioned on exposure to infection and, thus, can be used to es-

timate the effect of the intervention on both reducing susceptibility to infec-

tion and reducing transmission to others (Halloran, Longini and Struchiner,

1999). A good study design improves the quality of the information as well

as reduces both the length of the observation period and the number of close

contact groups needed to assess effectiveness of the intervention. Key study

design elements include randomization scheme and ascertainment method.

Typically, in infectious disease studies, the intervention product and placebo

are randomized either at the individual level within the close contact groups,

where each person is randomized independently, or at the group level, where

participants in the same groups receive either the intervention product or

placebo (Hayes et al., 1995; Donner, 1998). The two randomization schemes

may result in substantially different precision of the parameter estimates

(Datta, Halloran and Longini, 1999). Unlike the randomization issue, the

method to recruit and to follow close contact groups is intuitively related

to the size of a clinical trial. For example, a prospective trial generally has

complete observations in the sense that groups free of disease are enrolled

at the beginning of an epidemic season and then followed to some predeter-

mined end point. For a case-ascertained follow-up trial, close contact groups

are enrolled for observation if and only if an index case is ascertained. The

case-ascertained trial size would be much smaller than that of a prospective

follow-up study with the same number of cases, but at the price of poten-

2

Page 4

tial bias due to left truncation of the infection status of the non-index cases.

These later cases could already be infected at the time of ascertainment of

the close contact group, but not yet showing symptoms. If this left trunca-

tion can be dealt with, then the case-ascertained trials may be preferable to

the larger prospective trials.

Many methods are available for analyzing clinical trial data of acute in-

fectious diseases based on close contact groups (Becker, 1989). Longini and

Koopman (1982), Longini et al. (1988), Addy, Longini and Haber (1991)

and Magder and Brookmeyer (1993) developed methods that use only final

infection status of individuals within each close contact group. Rampey et al.

(1992) developed a method for time of onset data for prospective trials, but

not for case-ascertained trials.

In this paper, we develop two estimation procedures for both prospective

and case-ascertained clinical trials in close contact groups using likelihood-

based methods and generalized linear models fitted by iteratively re-weighted

least squares in combination with the EM algorithm. Individual- and group-

level randomization schemes as well as prospective and case-ascertained de-

signs are compared using simulations. The approaches are generalized to

stratified populations that include discrete covariates. We use these new

methods to estimate the prophylactic and treatment effectiveness of an in-

fluenza antiviral agent in two household trials.

2. Methods

Suppose, without loss of generality, that influenza is the infectious disease of

interest and that the close contact groups are households. In addition, the in-

tervention of interest is the prophylactic use of influenza antiviral agents. We

3

Page 5

define two types of potentially infectious contact: (1) being in the household

with another infected person, and (2) making contact with possibly infected

people outside of the household. We define p as the daily transmission prob-

ability per contact within the household between a susceptible person and an

infective person if both have not received antiviral agents. Similarly, define

b as the daily probability that a susceptible and untreated person is infected

by a source of infection from the community. The antiviral efficacy for sus-

ceptibility to infection and illness (AVES) measures how much an antiviral

agent will relatively reduce the probability that an uninfected person will be

infected and ill, when exposed to infection, compared to an uninfected person

not using an antiviral agent. Then AVES= 1 − θ, where θp is the reduced

transmission probability if the susceptible person is taking an antiviral agent

and exposed to an untreated infected person in the household. For simplicity,

we assume that efficacy is the same for contacts outside the household, i.e.,

the reduced transmission probability for a person taking an antiviral agent is

θb. The antiviral efficacy for infectiousness (AVEI) is how much an antiviral

agent will relatively reduce the probability that an infected person will trans-

mit influenza to others compared to an infected person who is not using an

antiviral agent. Then, AVEI= 1 − φ, where φp is the reduced transmission

probability if the infective person is treated. If both people of a transmission

pair are treated, we assume independence and multiplicativity of θ and φ

so that the transmission probability reduces to θφp. We make the following

assumptions about influenza: 1. The latent period (i.e., time from infection

to being infectious) is the same as the incubation period (i.e., time from in-

fection to the onset of illness symptoms). 2. The probability distributions of

4

Page 6

the lengths of the latent and the infectious periods are known.

2.1 Maximum likelihood estimates

We start with the model for the prospective follow-up study. Without

loss of generality, let the trial start on day 1 and end on day T. Let˜tidenote

the day of illness onset for an infected person i. We let ri(t) = (0 untreated,

1 treated) indicate the treatment status of person i on day t. The probability

that a susceptible person i escapes infection by an infective family member

j on day t is given by

qij(t) = 1 − θri(t)φrj(t)pf(t|˜tj), (1)

where the function f(t|˜tj) is the probability that person j is infectious on

day t given the day of illness onset˜tj. It follows that,

ei(t) = (1 − θri(t)b)

?

j∈Di

qij(t) (2)

and

Qi(t) =

t?

τ=1

ei(τ)(3)

are the escape probabilities for person i on day t and up to day t, respectively,

where Di is the set of people in the same household with person i. The

probability that person i is infected on day t is given by

Zi(t) = Qi(t − 1)(1 − ei(t)).

However, we do not observe the exact infection time but just the onset time

of illness. Let lmaxand lminbe the maximum and minimum duration of the

latent period. Further define ti=˜ti− lmaxand ti=˜ti− lminas the earliest

5

Page 7

and latest potential infection day for person i. Rampey et al. (1992) showed

that

Li=

?

Qi(T),

?ti

if individual i is not infected,

t=tig(˜ti|t)Zi(t), otherwise,

(4)

where g(˜ti|t) is the probability of illness onset on day˜ti, given infection on day

t, or equivalently, the latent period lasts for˜ti−t days. The Newton-Raphson

algorithm can be applied to obtain the MLEs.

Household-level randomization leads to the same treatment status for all

members of the same household, and thus the escape probability (1) simplifies

to

qij(t) = 1 − ξri(t)pf(t|˜tj), (5)

where ξ = θφ. Only ξ is identifiable from within-household infections, hence

θ can only be identified from contacts with the community, and φ is identified

as ξ divided by θ.

For a case-ascertained follow-up trial, a household is enrolled only if an

index case (i.e., illness) is identified, giving rise to left truncation. The full

likelihood in (4) should be conditioned on infection status of household mem-

bers on the illness onset day of the index case. By this means, the left trun-

cation problem is solved by the conditional likelihood, and selection bias is

eliminated because the index case, the direct reason for the enrollment of the

household, does not contribute to the likelihood. To see this, assume that

person i is a non-index-case household member. Let dibe the indicator for

the index case in the household of person i. Also, let˜tdibe the illness onset

day of di. Similarly, tdi=˜tdi− lmaxand tdi=˜tdi− lminare the earliest and

latest potential infection day of the index case (see figure 1).

6

Page 8

[Figure 1 about here.]

The marginal probability that i does not show illness symptoms up to

day˜tdiis

Pr[˜ti>˜tdi] =

? T

?

t=1

Zi(t) Pr[˜ti>˜tdi|t]

?

+ Qi(T), (6)

where Pr[˜ti>˜tdi|t] =?

longer than˜tdi−t, given person i was infected on day t. Since the latent period

is bounded between lminand lmax, Pr[˜ti>˜tdi|t] is 0 for {t :˜tdi− t ≥ lmax}

and 1 for {t :˜tdi− t < lmin}, or equivalently 0 for {t : t ≤ tdi} and 1 for

{t : t > tdi}. Also since

probability reduces to

τ>˜tdig(τ|t) is the probability that the latent period is

??T

t>tdiZi(t)

?

+ Qi(T) = Qi(tdi), the marginal

Pr[˜ti>˜tdi] =

?

tdi

?

t=tdi+1

Zi(t) Pr[˜ti>˜tdi|t]

?

+ Qi(tdi).(7)

By conditioning the full likelihood Liin (4) on the marginal likelihood Lm

i=

Pr[˜ti>˜tdi], the solution to the left truncation problem naturally results from

a decomposition of Liand Lm

i as follows. The marginal likelihood can be

written as Lm

i= Qi(tdi)Ai, where

??

Ai=

tdi

?

t=tdi+1

t−1

?

τ=tdi+1

ei(τ)??1 − ei(t)?

Pr[˜ti>˜tdi|t]

?

+

tdi

?

t=tdi+1

ei(t). (8)

Similarly, the full likelihood can be broken into Li= Qi(tdi)Bi, where

is the full likelihood after tdi. The conditional likelihood contribution of i is

given by

Bi=

?T

tdi+1ei(t),

?ti

if i is not infected,

τ=tdi+1ei(τ)??1 − ei(t)?,

t=tig(˜ti|t)??t−1

otherwise,

Lc

i= Li/Lm

i= Ai/Bi, (9)

7

Page 9

and the joint conditional likelihood?

maximized to obtain the MLEs. The expression of Lc

iLc

ifor the whole population will be

iin (9) suggests that the

likelihood history up to day tdibe cancelled in the conditional likelihood for

case-ascertained follow-up studies, which is not surprising since tdi+1 is the

first day with uncertainty about infection status of non-index-case members

in the household. For an index case, both the full likelihood contribution and

the marginal likelihood contribution are equal to the probability of illness

onset on day˜tdi, and hence the likelihood contribution of the index case is

cancelled by conditioning.

2.2 Iteratively re-weighted least squares (IRLS) model with the EM algo-

rithm

We derive an alternative to the ML method in this section. The con-

vergence of the MLEs based on the observed information obtained by the

Newton-Raphson algorithm may be sensitive to initial estimates if the sam-

ple size is relatively small. In addition, the calculation of Fisher’s informa-

tion is impractical in expression (4) when the logarithm of a sum is involved.

According to the theory of generalized linear models, the Iteratively Re-

weighted Least Squares (IRLS) method is equivalent to the Fisher scoring

method if the weight matrix is the inverse of the true covariance matrix

of the responses for one-parameter exponential families. This suggests that

MLEs based on Fisher’s information can at least be well approximated if

a linear model is available and fitted using the IRLS method. Such a lin-

ear model is straightforward, if we model pairwise transmissions. According

to (1), a slight modification can take into account the community-to-person

8

Page 10

transmission as the following:

qij(t) = 1 − θri(t)φrj(t)pψjb1−ψjf(t|˜tj), (10)

where ψj= 1 if j is a household member, and 0, otherwise. If ψj= 0, then

the exposure is from the community and fj(t|˜tj) = 1 and rj(t) = 0 for all t.

Taking the logarithm on both sides of (10),

log?1 − qij(t)?= log(b) + ψjlogp

b+ ri(t)log(θ) + rj(t)log(φ) + log?f(t|˜tj)?

= β0+ β1ψj+ β2ri(t) + β3rj(t) + log?f(t|˜tj)?.

Model (11) does not distinguish between the treatment randomization schemes

(11)

and it can be fitted to the data using the IRLS method with the EM algo-

rithm (Dempster, Laird and Rubin, 1977).

Let Yij(t) be the binary response such that Pr[Yij(t) = 1] = 1 − qij(t),

i.e., the indicator of whether person i is infected by person j on day t. The

outcome Yij(t) is not observable, however, the probabilities, or expected fre-

quencies, of Yij(t) = 1 and Yij(t) = 0 conditioned on the knowledge of˜ti

can be used to fit the model. When t < tior i escapes infection throughout

the trial, Yij(t) = 0 with probability 1, for all j ∈ Di∪ c, where c indi-

cates exposure from the community. When ti≤ t ≤¯ ti, potential outcomes

of Yij(t) include both 1 and 0. Let ζij(t|˜ti) and ηij(t|˜ti) be the conditional

expected frequencies of Yij(t) = 1 and Yij(t) = 0, respectively. To facilitate

the calculation of ζij(t|˜ti) and ηij(t|˜ti), it is necessary to define the following

events:

• Ii(t): the event that person i has illness onset on day t.

• Λi(t): the event that person i is infected on day t.

9

Page 11

• Λi,j(t): the event that person i is infected by j on day t.

Then, the conditional expected frequencies are given by

ζij(t|˜ti) =Pr?Λi,j(t)?

Pr?Ii(˜ti)? × Pr?Ii(Ti) | Λi(t)?,

and

ηij(t|˜ti) =Pr?Ii(˜ti) | Λi(t)?×?Pr?Λi(t)?− Pr?Λi,j(t)??

Pr?Ii(˜ti)?

+

¯ ti

?

τ=t+1

Pr?Ii(˜ti) | Λi(τ)?× Pr?Λi(τ)?

(12)

Pr?Ii(˜ti)?

.

Suppose that we have estimates (ˆbl−1, ˆ pl−1,ˆθl−1,ˆφl−1) from the (l − 1)th

iteration, then in the lthiteration we have

?

Qi(t − 1)ˆθri(t)

Pr[Λi(t)] = Qi(t − 1)

Pr[Λi,j(t)] =

Qi(t − 1)ˆθri(t)

?

?

l−1ˆφrj(t)

l−1ˆbl−1,

1 − (1 −ˆθri(t)

l−1ˆ pl−1f(t|˜tj),j ∈ Di

j = c

l−1ˆbl−1)

?

j∈Di

?1 −ˆθri(t)

l−1ˆφrj(t)

l−1ˆ pl−1f(t|˜tj)??

,

Pr?Ii(Ti)?=

¯ ti

τ=ti

Pr?Ii(Ti) | Λi(τ)?× Pr?Λi(τ)?,

Pr?Ii(Ti) | Λi(τ)?= g(˜ti|τ).

The likelihood history before day ti can be dropped from Pr[Λi,j(t)] and

Pr[Λi(t)], since Qi(ti− 1) is the common factor and will eventually be can-

celled out in the calculation of ζij(t|˜ti) and ηij(t|˜ti).

Suppose there are H covariate patterns for model (11), then the binary

responses can be summarized into H binomial proportions Ph, h = 1,...,H,

based on the conditional expected frequencies. We fit model (11) by min-

imizing the objective function?H

difference between the observed proportionˆPhand the mean proportion¯Ph.

h=1wh{log(ˆPh) − log(¯Ph)}2, the squared

10

Page 12

The weight for the hthcategory equals the reciprocal variance of the response

function, i.e., wh= VAR−1?log(Ph)?≈nh×¯Ph

servations in the hthcategory. The mean proportion¯Phcan be estimated

by eitherˆPh(data-based) or the fitted response˜Ph(model-based). In our

simulations, model-based weights lead to severe under-estimation of φ, while

1−¯Ph, where nhis the number of ob-

data-based weights substantially over-estimate φ. Using the arithmetic mean

wh=1

2{nh×ˆPh

1−ˆPh+nh×˜Ph

1−˜Ph} leads to a close approximation to the MLEs. In ad-

dition, ifˆPh= 0, we replaceˆPhby˜Phfrom the last iteration. Letˆβ0,...,ˆβ3

be the WLS estimates of coefficients for model (11), then the WLS estimates

of the parameters at the lthiteration are

ˆbl= exp(ˆβ0)ˆ pl= exp(ˆβ0+ˆβ1)

ˆθl= exp(ˆβ2)

ˆφl= exp(ˆβ3).

We then update the parameters and re-fit the model until the estimates

converge. The standard errors of parameter estimates are obtained using the

delta method.

Though utilizing the conditional likelihood for the MLE method is nat-

ural, an adjustment with straightforward statistical meaning is difficult in

the IRLS case. However, this difficulty can be circumvented by noticing

that minimizing the weighted least squares is analogous to maximizing the

log-likelihood. In the MLE method, we maximize

?

i

log(Lc

i) =

?

i

log(Li) −

?

i

log(Lm

i) =

?

i

log(Bi(β)) −

?

i

log(Ai(β)),

(13)

where Aiand Biin the conditional likelihood (9) are re-expressed as functions

of β = (β0,...,β3). Therefore, it is natural to use the adjusting term in (13)

11

Page 13

to penalize the objective function

H

?

h=1

wh{log(ˆPh) − log(¯Ph)}2+

?

i

log(Ai(β)).

Define the covariate matrix by X, the weight matrix by W and the observed

response vector by log(ˆ P), then at the lthiteration,

ˆβ

(l)= (X?WX)−1?

Assessing Fit

X?W log(ˆP) −1

2

?

i

d log(Ai(ˆβ

dˆβ

(l−1)))

(l−1)

?

.

2.3

We use a frequency grouping method motivated by the development of

goodness-of-fit tests for logistic regression models (Hosmer and Lemeshow,

1980). Since the only observable binary outcome is illness onset times for each

person, we track the exposure status of each participant and then calculate

the probability of illness onset for each day based on the fitted model. This

probability serves two purposes: first, it is the model-predicted frequency of

illness onset times per person-day that will be compared with the observed

onset times; and second, it measures the risk level of exposure to infection

that is used to determine the risk categories.

The probability of illness onset on day t for person i is

πi(t) =

t−lmax

?

τ=t−lmin

?

1 − (1 − θri(t)b)

?

j∈Di

?1 − θri(t)φrj(t)pf(τ|˜tj)??

g(t|τ).

Suppose the population of πi(t)’s has been grouped into m risk levels accord-

ing to the percentiles with cut points 0 = c0< c1< ... < cm= 1. Then

˜ nk=?

be the total person-days and nkbe the observed number of illness onsets at

ck−1<πi<ckπiis the fitted number of illness onsets at level k. Let Nk

12

Page 14

level k, then

m

?

(nk−˜ nk)2

˜ nk

j=1

Nk(nk− ˜ nk)2

˜ nk(Nk− ˜ nk)∼ χ2

m−2,

which simplifies to?m

3. Simulation study

j=1

, if ˜ nk? Nkfor all k.

We created a discrete event stochastic simulation model to assess how well the

estimation procedures perform and to investigate what the best intervention

trial design would be. We created a simulation community composed of 749

households with 2000 people based on the distributions of age and household

sizes from the US Census 2000. Since households with a single member do

not provide information on φ, we do not include these households for analysis.

The profile of the simulated household sizes is {2 : 67%,3 : 13%,4 : 10%,5 :

7%,6 : 2%,7 : 1%}. We stopped the simulated trials on day 100 which

represents the typical length of the influenza season for a community. We set

the values of parameters as b = 0.004,p = 0.1,θ = 0.7,φ = 0.2. The empirical

latent and infectious period distributions were based on past experience with

influenza (Elveback et al., 1976), and are given in Table 1, from which f(t|˜ti)

and g(˜ti|t) were derived. One thousand stochastic replications were carried

out for each scenario investigated.

[Table 1 about here.]

We are interested in which intervention trial designs and estimation meth-

ods give the most efficient estimates of AVESand AVEI. Table 2 compares

estimates as well as Monte Carlo standard errors between the randomiza-

tion schemes (columns) and between methods of ascertainment and follow-up

13

Page 15

(rows) for θ and φ. Under the household-level randomization in both follow

up studies, θ is slightly biased upward to 0.71, and φ is dramatically over-

estimated by 0.24. Most striking is that the individual-level randomization

is much more efficient than the household-level randomization regardless of

the follow up scheme. For instance, in the prospective follow up study, the

s.e.(ˆθ) is 0.083 and s.e.(ˆφ) is 0.045 under the individual-level randomization,

compared to 0.25 and 0.16, respectively, under the household-level random-

ization. In these simulations, of all infections that occurred in susceptible

people, only 5% on average occurred during the prophylaxis period under the

household-level randomization, much lower than 11% under the individual-

level randomization. The low proportion of infections during prophylaxis

leads to the larger standard errors ofˆθ andˆφ. However, the overall attack

rates are similar, 43% for the individual-level randomization and 44% for the

household-level randomization. Table 2 not only shows that the household-

level randomization leads to positive bias and larger instability ofˆθ andˆφ,

but also confirms that the case-ascertained study is almost as efficient as the

prospective study in estimating θ and φ, given the same type of random-

ization scheme. Estimates of the transmission probabilities p and b do not

differ by trial design, and these estimates are not given in Table 2. The IRLS

estimates were similar to MLEs and almost as efficient for both trial designs

(results not shown).

[Table 2 about here.]

4. Generalization to heterogeneous populations

For a heterogeneous population composed of k risk categories of people (e.g.,

age groups), we will assume without loss of generality that the AVES and

14

Page 16

AVEIare the same for all categories, but the transmission probabilities are

different across categories. Specifically, let pvu be the pairwise transmis-

sion probability per unprotected contact between a susceptible individual in

category u and an infective person in category v. Further, let bu be the

community transmission probability for category u. Hence, there will be k

parameters for community transmission probabilities and k2parameters for

household transmission probabilities.

The construction of the likelihood for a heterogeneous population is the

same as that for a homogeneous population except that p and b in expressions

(1) and (2) will be replaced by buand pvucorresponding to relevant categories.

However, the linear model in (11) needs to be modified to accommodate

stratified transmission probabilities, and can be organized in matrix terms.

Let I{i∈u}indicate whether individual i belongs to category u (1:i ∈ u, 0:i / ∈

u). Under the heterogeneity setting, the pairwise escape probability for a

susceptible individual i on day t is given by

qij(t) = 1 − θri(t)φrj(t)f(t|˜tj)?

Suppose group k is the reference stratum. The model in matrix form

k?

u

buI{i∈u}?

1−ψj?

k?

u,v

pvuI{i∈u}I{j∈v}?

ψj

. (14)

based on (14) would be

log(1−qij(t)) = β(b)τIi+Jiτβ(p)Ii+β(θ)ri(t)+β(φ)rj(t)+log(fj(t|˜tj)), (15)

15

Page 17

where β(θ)= log(θ), β(φ)= log(φ), and

Ii= (I{i∈1},...,I{i∈k−1},1)τ,

Ji= (ψjI{j∈1},...,ψjI{j∈k−1},1)τ,

β(b)= (β(b)

1,...,β(b)

k)τ= (log(b1

bk),...,log(bk−1

log(p11pkk

...

log(p(k−1)1pkk

log(pk1bk

bk

log(p1(k−1)pkk

),log(bk))τ,

β(p)= {β(p)

vu}k×k=

p1kpk1)...

...

p1kpk(k−1))

...

p(k−1)kpk(k−1)) log(p(k−1)k

log(pk(k−1)bk

log(p1k

pkk)

...

p(k−1)kpk1) ... log(p(k−1)(k−1)pkk

pkkb1)...

pkk

bk)

)

pkkbk−1) log(pkk

.

The conditional expected frequencies ζij(t) and ηij(t) have exactly the same

form as in (12), except that stratum-specific transmission probabilities need

to be used in the calculation.

5.Data analysis

Oseltamivir is an orally administered influenza neuraminidase inhibitor. Two

randomized controlled multi-center Phase III efficacy trials were conducted in

North America and Europe during the winter influenza seasons of 1998 - 1999

and 2000 - 2001. Details of the two trials are given in Table 3. Both trials were

case-ascertained with household-level randomization. In the first trial (trial

I, Welliver et al., 2001), index cases were not treated by either Oseltamivir

or placebo but eligible exposed household members (aged 12+ years) were

given Oseltamivir for prophylaxis or placebo within 48 hours of the onset

of illness symptoms in the index case. In the second trial (trial II, Hayden

et al., 2004), all index cases were treated by Oseltamivir after ascertainment.

Exposed household members (aged 1+ years) were randomized to groups

with or without Oseltamivir for prophylaxis and, in the case of illness, were

treated by Oseltamivir. The AVEScan be estimated solely from trial I, but

16

Page 18

most of the information about the AVEIcomes from pooling data from the

two trials.

[Table 3 about here.]

For trial I, 372 households were recruited with a total of 1329 participants,

while for trial II, 277 households were recruited with 1110 participants. In

trial I, 38 out of 464 susceptible participants in the placebo group had a

laboratory-confirmed infection, while 4 out of 493 susceptible participants

prophylaxed with Oseltamivir were infected. In trial II, 45 out of 392 sus-

ceptible participants in the group without prophylaxis and 14 out of 420 in

the group with Oseltamivir prophylaxis became infected. About half of the

index cases were laboratory-confirmed to be infected in both trials. We only

used laboratory-confirmed illness in our analysis.

We assume the two trials share the same p, θ and φ, but have different

probabilities of infection from the community, i.e., b1for trial I and b2for

trial II. Hence, the pooled population is partially stratified by trial. Let γj

be the trial indicator (1:trial I, 0:trial II). Slight modification is necessary for

the IRLS model, where equation (11) is replaced by

log?1−qij(t)?= log(b2)+γi(1−ψj)logb1

The estimated parameters are given in Table 4. The IRLS and ML meth-

b2+ψjlogp

b2+ri(t)log(θ)+rj(t)log(φ)+log?f(t|˜tj)?.

ods give very similar parameter estimates, but the IRLS method tends to give

somewhat smaller standard errors. We used the non-iteratively weighted least

squares estimates (see Appendix) as starting values for both the IRLS and the

Newton-Raphson algorithms, but only the IRLS estimates converged. The

17

Page 19

MLEs in Table 4 were obtained with the IRLS estimates as starting values.

This indicates that the IRLS approach is less sensitive to data sparseness,

and thus provides initial estimates for the maximum likelihood approach.

[Table 4 about here.]

The prophylactic use of Oseltamivir is shown to be significantly protective

against infection with illness by?

AVES = 0.86 with a 95% C.I. (0.70, 1.0).

Thus, if a person uses Oseltamivir prophylactically, it reduces his chance

of being infected with illness by 86% per exposure to an untreated infected

person. In addition, Oseltamivir significantly reduces the infectiousness of

infected people who take the drug therapeutically, as shown by?

AVEI= 0.62

with a 95% C.I. (0.31, 1.0). Thus, if an infected person uses Oseltamivir

therapeutically, it reduces his chance of transmitting influenza to another

person not using the drug by 62% per exposure. Welliver et al. (2001)

reported an efficacy in preventing clinical influenza of 89% (95%C.I.: 71%-

96%) and an efficacy in inhibiting viral shedding of 84% (95%C.I.: 57%-

95%) for trial I, and Hayden et al. (2004) reported a conditional efficacy in

preventing lab-confirmed influenza of 68% (95%C.I.: 35%-84%) when index

cases were treated. These results are comparable to our efficacy estimates.

The MLEs of b1 and b2 for the two trials are about the same, indicating

that each untreated person had about 1/1000 chance of being infected from

outside of the household each day. The estimate of p indicates about a 2%

chance of secondary household spread between two household members per

day.

Table 5 shows the results based on MLEs when we use the heterogeneous

model to stratify the analysis across age groups. We divided the popula-

18

Page 20

tion into two age groups: children (1-17) and adults (18+). The estimated

transmission probability for children (0.0023) is four times higher than that

for adults (0.00055). For secondary spread within the household, the esti-

mated transmission probability between untreated children (? pcc= 0.038) is

efficacy estimates,?

AVES= 0.85(0.69, 1.0) and?

nearly twice as high as that between untreated adults (? paa= 0.022). The

about the same as found in the unstratified analysis. Another way to as-

AVEI= 0.66(0.36, 1.0), are

sess secondary spread in the household is via the household secondary attack

rate (SAR). We define SARxyas the probability that an untreated infected

person with covariate value x infects an untreated household member with

covariate value y throughout the former person’s infectious period. Since we

assume the average infectious period for influenza to be 4.1 days, we define

SARxy = 1 − (1 − pxy)4.1. The estimated SARs are give in Table 5. The

estimated SAR among children, 0.15, is almost twice that among adults,

0.086.

[Table 5 about here.]

We assessed how well the heterogeneous model fits the data. We categorized

the probabilities of illness onset per person-day into 10 risk levels in Table 6

according to the clustering pattern of the probabilities. For the most part,

there is good agreement between the observed and predicted illness onset

counts. To calculate an approximate χ-square statistic, we collapsed the

first three levels into one level, level 4 and 5 into one level and levels 6 and 7

into one level. This results in χ2

4= 2.36 that has a p-value of 0.67, indicating

an adequate fit.

19

Page 21

[Table 6 about here.]

6. Discussion

We have developed statistical methods for estimating the effectiveness of

infectious disease interventions from illness incidence data in close contact

groups. We have used these methods to evaluate which study design may

be the best for clinical trials in close contact groups. Our simulations in-

dicate that the individual level randomization scheme is more efficient than

the household level randomization in terms of producing smaller standard

errors for the parameter estimates of interest. The result is consistent with

that of Datta et al. (1999) who found that, for households of size two, indi-

vidual level randomization was generally more efficient than household level

randomization. Our simulations also reveal that, if correctly adjusted for left

truncation, the case-ascertained follow-up study provides estimates of the

same quality as those provided by the prospective follow-up study, although

the latter contains extra information about the CPI. The IRLS method that

we developed provides parameter estimates nearly as good as the maximum

likelihood method in both the simulated and actual data analysis.

In this work, we provide the first joint estimates of the efficacy of an

influenza antiviral agent in reducing susceptibility to infection with illness

(AVES) and in reducing infectiousness (AVEI). We estimate that the pro-

phylactic use of Oseltamivir reduces the probability of infection and illness

given exposure to infection by 85%. In addition, if Oseltamivir is taken soon

after influenza symptoms appear, then the probability the infected person

will transmit to others in the household is reduced by 66%. Longini et al.

(2004) used similar values to these estimates in a stochastic simulation model

20

Page 22

of pandemic influenza to show that the targeted use of Oseltamivir in close

contact groups could slow the transmission of influenza on the community

level. Such an intervention would be important because there would be little

or no vaccine available against the first wave of pandemic influenza (Longini,

Ackerman and Elveback, 1978; Longini et al., 2004).

In this work, we have estimated the age-group specific household SAR’s

for influenza. Our estimated child to child SAR of 15% is nearly twice has

high as that between adults. In addition, the infection probability from out-

side of the household for children is estimated to be four times as high as that

for adults. These estimates are comparable to those of Addy et al. (1991),

confirming that children are both the major introducers and spreaders of

influenza in households. These estimates are also important for construction

of influenza simulation models (Longini et al., 2004; Halloran et al., 2002).

Several issues have not been addressed in this paper and are subject

to future investigation. First, heterogeneity may be present in both the

probability that a household is invaded with influenza and SAR’s within

households. In this case, random effects terms added to the transmission

probabilities might be appropriate (Halloran, Pr´ eziosi and Chu, 2003). How-

ever, this would complicate the model and increase the richness of the data

requirements. We already take the clustering of cases into account through

the transmission model that includes secondary transmission within house-

holds. A related issue is the possibility of ascertainment bias for the case-

ascertainment design since households more prone to influenza invasion have

a higher probability of having an index case and, thus, getting into the sam-

ple. Dealing with this potential source of bias will require further research.

21

Page 23

Finally, parameter estimation could be affected by potential misclassification

of infection and illness status of study participants. Validation set methods

could be developed for study designs that include more sensitive infection

detection tests for a subset of study participants (Halloran et al., 2003).

Based on this research, we recommend that future intervention trials in

close contact groups for the efficacy of the intervention to prevent infection,

illness and transmission be randomized on an individual level and be case-

ascertained if there are limited resources for the trial. The close contact

group setting is the best way to condition on exposure to infection which

allows accurate estimation of the intervention efficacy (Halloran et al., 1999).

These estimates provide the best information for predicting the individual-

level effect of the intervention, and can be used in transmission models to

assess the potential effect of the intervention on a community level.

Acknowledgements

This work was partially supported by National Institute of Allergy and Infec-

tious Diseases grant R01-AI32042. The data on the household Oseltamivir

prophylaxis studies were provided by Roche Laboratories Inc.

References

Addy, C. L., Longini, I. M. and Haber, M. S. (1991). A generalized stochastic

model for the analysis of infectious disease final size data. Biometrics 47,

961–974.

Becker, N. G. (1989). Analysis of infectious disease data. Chapman and Hall,

New York.

22

Page 24

Census 2000 (2001). United States Census Bureau. http://www.census.gov/

(Posted July 25, 2001).

Datta, S., Halloran, M. E. and Longini, I. M. (1999). Efficiency of estimating

vaccine efficacy for susceptibility and infectiousness: randomization by

individual versus household. Biometrics 55, 792–798.

Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likeli-

hood from incomplete data via the EM algorithm. Journal of the Royal

Statistical Society 39, 1–38.

Donner, A. (1998). Some aspects of the design and analysis of cluster ran-

domized trials. Applied Statistics 47, 95–113.

Elveback, L. R., Fox, J. P. and Ackerman, E. (1976). An influenza simulation

model for immunization studies. American Journal of Epidemiology 103,

152–165.

Halloran, M. E., Longini, I. M., Cowart, D. M. and Nizam, A. (2002). Com-

munity trials of vaccination and the epidemic prevention potential. Vac-

cine 20, 3254–3262.

Halloran, M. E., Longini, I. M., Gaglani, M. J., Piedra, P. A. and Chu,

H. (2003). Estimating efficacy of trivalent, cold-adapted, influenza virus

vaccine (CAIV-T) against influenza A (H1N1) and B using surveillance

cultures. American Journal of Epidemiology 158, 305–311.

Halloran, M. E., Longini, I. M. and Struchiner, C. J. (1999). Design and

interpretation of vaccine field studies. In Monto, A. S. and Thacker,

S. B., editors, Epidemiologic Reviews: Vaccines, volume 21, pages 73–88.

Halloran, M. E., Pr´ eziosi, M. and Chu, H. (2003). Estimating vaccine ef-

ficacy from secondary attack rates. Journal of the American Statistical

23

Page 25

Association 98, 38–46.

Hayden, F. G., Belshe, R., Villanueva, C., Lanno, R., Hughes, C., Small, I.,

Dutkowski, R., Ward, P. and Carr, J. (2004). Management of influenza in

households: a prospective, randomized comparison of oseltamivir treat-

ment with or without postexposure prophylaxis. Journal of Infectious

Diseases 189, 440–449.

Hayes, R., Mosha, F., Nicol, A., Grosskurth, H., Newell, J., Todd, J., ,

Killewo, J., Rugemalila, J. and Mabey, D. (1995). A community trial of

the impact of improved of sexually transmitted disease treatment on the

HIV epidemic in rural Tanzania: 1. Design. AIDS 9, 919–926.

Hosmer, D. W. and Lemeshow, S. (1980).Goodness of fit tests for the

multiple logistic regression model. Communications in Statistics A9(10),

1043–1069.

Longini, I. M., Ackerman, E. and Elveback, L. R. (1978). An optimization

model for influenza A epidemics. Math Biosci 38, 141–157.

Longini, I. M., Halloran, M. E., Nizam, A. and Yang, Y. (2004). Containing

pandemic influenza with antiviral agents. American Journal of Epidemi-

ology 159, 623–633.

Longini, I. M. and Koopman, J. S. (1982). Household and community trans-

mission parameters from final distributions of infections in households.

Biometrics 38, 115–126.

Longini, I. M., Koopman, J. S., Haber, M. and Cotsonis, G. A. (1988).

Statistical inference for infectious diseases: Risk-specified household and

community transmission parameters. American Journal of Epidemiology

128, 845–859.

24

Page 26

Magder, L. and Brookmeyer, R. (1993). Analysis of infectious disease data

from partner studies with unknown source of infection. Biometrics 49,

1110–1116.

Rampey, A. H., Longini, I. M., Haber, M. J. and Monto, A. S. (1992). A

discrete-time model for the statistical analysis of infectious disease inci-

dence data. Biometrics 48, 117–128.

Satten, G. A., Mastro, T. D. and Longini, I. M. (1994). Modelling the female-

to-male per-act HIV transmission probability in an emerging epidemic in

Asia. Statist in Medicine 13, 2097–2106.

Welliver, R., Monto, A. S., Carewicz, O., Schattemanet, E., Hassman, M.,

Hedrick, J., Jackson, H. C., Huson, L., Ward, P. and Oxford, J. S. (2001).

Effectiveness of oseltamivir in preventing influenza in household contacts:

a randomized controlled trial. JAMA 285, 748–754.

Appendix A

Non-iteratively weighted least squares initial estimates in homogeneous

population

Both MLE and IRLS require initial estimates to start the iteration. In the

case of small sample size, reasonable initial estimates are crucial for con-

vergence. In the IRLS method, the event of pairwise transmission Λi,j(t)

is modelled. In the calculation of the conditional expected frequencies of

infection ζij(t) = Pr?Λi,j(t) | Λi(t)?× Pr?Λi(t) | Ii(˜ti)?, pre-evaluation of

parameters are involved in both components on the right side. This suggests

that, if we model Λi(t) instead of Λi,j(t), i.e., daily transmission instead of

pairwise transmission, the first component Pr?Λi,j(t) | Λi(t)?will no longer

25

Page 27

be present. Moreover, if we assume equal Pr?Λi(t)] for all possible t, the sec-

ond component Pr?Λi(t) | Ii(˜ti)?will be simplified to Pr?Ii(˜ti) | Λi(t)?that

is also known as the empirical distribution gi(t), and there will be no need

for pre-estimation of parameters at all. Therefore, a non-iterative model is

available by modelling Λi(t) and simplifying Pr?Λi(t) | Ii(˜ti)?.

Taking the logarithm of both sides of (2), where qij(t) is defined in (5),

we have

log(ei(t)) = log(1 − θri(t)b) +

?

j∈Di

log(1 − θri(t)φrj(t)f(t|˜tj)p). (A.1)

To turn (A.1) into a linear model, some approximation techniques are needed.

It can be shown that (1 − p)θ< (1 − θp) < e−θpholds for θ,p ∈ (0, 1), and

both (1−p)θand e−θpare good approximations to (1−θp) when p is small.

Both approximations have their own background in defining the vaccine or

antiviral efficacy. If the antiviral agent increases the escape probability from

1 − p to (1 − p)θ, then the efficacy is given by AVEs = 1 −

1 −

mulative incidence data. For e−θp, obviously θ is the relative residual cumu-

log([1−p]θ)

log(1−p)

=

log(1−SARtrt)

log(1−SARplacebo), which is often used to estimate vaccine efficacy from cu-

lative hazard. Using the above relationship, we can use either approximation

to develop a linear model. However, the exponential approximation e−θpper-

forms slightly better than (1−p)θwhen θ is small. Also, from the perspective

of escape probability, (1 − θp)n≈ e−λθpis a natural survival function if the

number of contacts n has a Poisson distribution (Satten et al., 1994). Hence,

e−θpis recommended if the efficacy is believed to be larger than 50%. With

the exponential approximation, (2) can be re-written as

ei(t) = exp{−θri(t)b} × exp{−

?

j∈Di

θri(t)φrj(t)fj(t|˜tj)p}. (A.2)

26

Page 28

Apply θri(t)= 1 + ri(t)(θ − 1) to (A.2) and take the logarithm on both

sides, yielding

−log(ei(t)) =β0+ β1(θ − 1)ri(t)

+ β2

??

??

j

(1 − ri(t))(1 − rj(t))f(t|˜tj)?+ β3

(1 − ri(t))rj(t)f(t|˜tj)?+ β5

??

ri(t)rj(t)f(t|˜tj)?,

j

ri(t)(1 − rj(t))f(t|˜tj)?

+ β4

j

??

j

(A.3)

where β0= b, β1= b(θ − 1), β2= p, β3= θp, β4= φp, and β5= θφp.

Model (A.3) gives rise to multiple estimators for the efficacy parameters

because of the increase in parameter dimension. Then θ has three estimators:

ˆθ1 = 1 +ˆβ1/ˆβ0,ˆθ2 =ˆβ3/ˆβ2 andˆθ3 =ˆβ5/ˆβ4, while φ has two estimators:

ˆφ1=ˆβ4/ˆβ2andˆφ2=ˆβ5/ˆβ3. The average of the multiple estimates weighted

by reciprocal standard errors can serve as the initial estimate, e.g.,ˆθ =

?3

For household-level randomization, since ri(t) = rj(t) for i and j in the

i=1ωiˆθi, where ωi=

1

s.e.(ˆθj).

s.e.(ˆθi)

?3

j=1

1

same household, (A.3) can be simplified to

−log(ei(t)) =β0+ β1ri(t)

+ β2(1 − ri(t))??

j

fj(t|˜tj)?+ β5ri(t)??

j

fj(t|˜tj)?.

β0can only be

In this case, each parameter has a unique estimator, but θ =β1

estimated from the community-to-person transmission.

Assume that Pr?Λi(t)?is equal for all t ∈?ti,

frequencies of infection and escape are

ζi(t|˜ti) = Pr?Ii(˜ti) | Λi(t)?,

ηi(t|˜ti) =

¯ ti

?. Then the expected

¯ ti

?

τ=t+1

Pr?Ii(˜ti) | Λi(τ)?,

27

Page 29

where Pr?Ii(˜ti) | Λi(t)?= g(˜ti|t). Since these conditional expected frequen-

cies do not involve any parameters, model (A.3) can be fitted non-iteratively

without pre-estimates of the parameters.

Appendix B

Non-iteratively weighted least squares initial estimates in heterogeneous

population

The probability that person i escapes infection within day t could be written

as:

ei(t) = {1 − θri(t)

k?

u=1

buI{i∈u}} ×

?

j∈Di

{1 − θri(t)φrj(t)f(t|˜tj)

k?

u,v=1

pvuI{i∈u}I{j∈v}}.

Applying the exponential approximation leads to the following model in ma-

trix notation:

−log(ei(t)) =β(b)τIi+ β(θb)τI(r)

i(t) +˜Jτ

iβ(p)˜Ii+˜Jτ

iβ(θp)˜I(r)

i(t)+

˜J(r)

i(t)

τβ(φp)˜Ii+˜J(r)

i(t)

τβ(θφp)˜I(r)

i(t),

(B.4)

where the covariates and coefficients are defined by

Ii=?I{i∈1},...,I{i∈k−1},1?τ

˜Ji=??

β(b)=?b1− bk,...,bk−1− bk,bk

β(p)=

For household-level randomization, similarly define ξ = (θ−1)(φ−1) and

β(ξp)= ξβ(p). Then, model (B.4) is simplified to

I(r)

˜I(r)

i(t) = ri(t)Ii

i(t) = ri(t)˜Ii

j∈DiI{j∈k}f(t|˜tj)?τ

?τ

β(θp)= (θ − 1)β(p)

β(φp)= (φ − 1)β(p)

β(θφp)= (θ − 1)(φ − 1)β(p).

˜Ii=?I{i∈1},...,I{i∈k}

˜J(r)

?τ

j∈DiI{j∈1}f(t|˜tj),...,?

i(t) =??

j∈Dirj(t)I{j∈1}f(t|˜tj),...,?

p11

... p1k

...

pk1 ... pkk

j∈Dirj(t)I{j∈k}f(t|˜tj)?τ

β(θb)= (θ − 1)β(b)

...

...

−log(ei(t)) = β(b)τIi+ β(θb)τI(r)

i(t) +˜Jτ

iβ(p)˜Ii+˜J(r)

i(t)

τβ(ξp)˜Ii.

28

Page 30

Illness onset day

of the index case

Earliest potential

infection day of

the index case

Days

i dt?

max

l

i

dt

i

dt

Latest potential

infection day of

the index case

min

l

Figure 1. Time sequence of the earliest and latest potential infection days

and the illness onset day for an index case as determined by the minimum

and maximum duration of the latent period. Members other than the index

case in the household must have escaped infection up to day tdi. If infected

after day tdi, a non-index case shows no illness symptoms up to day˜tdiwith

a positive probability. This probability is 1 if the infection happens after day

tdi.

29

Page 31

Table 1

Empirical cumulative distributions of the latent period and the infectious period

for influenza (Elveback et al., 1976).

Latent Period

Duration

(days)

1

2

3

Infectious Period

Duration

(days)

3

4

5

6

Cumulative

Probability

0.2

0.8

1.0

Cumulative

Probability

0.3

0.7

0.9

1.0

30

Page 32

Table 2

Comparison of MLEs by randomization schemes and household follow up

schemes. Results are based on 1000 simulations.

Parametersa

EstimateMC Standard Errorb

Individual

Level

Individual

Level

Household

Level

Household

Level

θ

Prospective

Case-ascertained

0.70

0.70

0.71

0.71

0.083

0.083

0.25

0.26

φ

Prospective

Case-ascertained

0.20

0.20

0.24

0.24

0.045

0.044

0.16

0.15

aTrue efficacy-related parameters are set to θ=0.70, φ=0.20.

bMonte Carlo standard errors.

31

Page 33

Table 3

Two randomized multi-center trials conducted in North America and Europe in

winter seasons for evaluating the efficacy of Oseltamivir, an influenza antiviral

agent.

Trial ITrial II

(Welliver et al. 2001)

1998-1999

372

1329

None

(Hayden et al. 2004)

2000-2001

277

1110

Oseltamivir

Time of trial

Households

Population

Treatment for illness

Duration of medication

Illness treatment

Prophylaxis

Follow up (symptom diary)

Infected/Exposed(index)

Infected/Exposed(susceptible)

Control†

Oseltamivir

Numbers may slightly differ from references due to different criteria of data inclusion for analysis.

† Participants in the control group received placebo in trial I and no treatment in trial II (see text

for further explanation).

N/A

7 days

14 days

165/372

5 days

10 days

30 days

179/298

38/464

4/493

45/392

14/420

32

Page 34

Table 4

Maximum likelihood estimates and iteratively re-weighted least squares estimates

for pooled Oseltamivir trials in 1998-1999 and 2000-2001, North America and

Europe.

Parameter

WLS (initial)b

IRLS

s.e.

MLE

s.e.

95% C.I.

of MLE

aCPIs are assumed different between the two trials.

bNon-iterative weighted least squares estimates. See appendix

for the model description.

ba

1

ba

2

p AVES

0.75

0.82

(0.069)

0.86

(0.087)

(0.70,

1.0)

AVEI

0.45

0.64

(0.11)

0.62

(0.22)

(0.31,

1.0)

0.0017

0.0011

(0.00022)

0.0012

(0.00036)

(0.00068,

0.0022)

0.0013

0.0011

(0.00016)

0.0011

(0.00026)

(0.00072,

0.0018)

0.030

0.020

(0.0019)

0.020

(0.0038)

(0.015

0.026)

33

Page 35

Table 5

Maximum likelihood estimates by age (1-17 vs 18+) for pooled Oseltamivir trials

conducted in 1998-1999 and 2000-2001, North America and Europe.

Parameter

bca

ba

pcc

pca

pac

paa

AVES

AVEI

SARccb

SARca

SARac

SARaa

a,bSubscription c denotes child (1-17), a denotes adult,

(18+) and ca denotes child-to-adult transmission.

bSARvu is based on the average 4.1 days of infectious,

period, i.e., SARvu= 1 − (1 − pvu)4.1.

MLE

0.0023

0.00055

0.038

0.012

0.018

0.022

0.85

0.66

0.15

0.049

0.071

0.086

s.e.

0.0005

0.0002

0.01

0.004

0.007

0.005

0.09

0.20

0.035

0.014

0.028

0.020

95%

(0.0015, 0.0035)

(0.0003, 0.001)

(0.023, 0.063)

(0.007, 0.021)

(0.008, 0.040)

(0.014, 0.034)

(0.69, 1.0)

(0.36, 1.0)

(0.074, 0.21)

(0.021, 0.075)

(0.014, 0.13)

(0.047, 0.12)

34

Page 36

Table 6

Assessing goodness-of-fit of the likelihood model for pooled Oseltamivir trials

conducted in 1998-1999 and 2000-2001, North America and Europe.

Risk

Level

1

2

3

4

5

6

7

8

9

10

TotalObserved # of

illness onsets

Predicted # of

illness onsetsPerson-days

2084

1321

15878

1434

8165

933

935

1241

1084

894

0

0

8

1

0

0

9

3

1922

3

5

3

4

9 12

17

25

18

27

35