Content uploaded by Wai Sum Chan
Author content
All content in this area was uploaded by Wai Sum Chan on Apr 18, 2014
Content may be subject to copyright.
Statistica Sinica 17(2007), 387-409
A GENERALIZED DROP-THE-LOSER URN FOR CLINICAL
TRIALS WITH DELAYED RESPONSES
Li-Xin Zhang1, Wai Sum Chan2, Siu Hung Cheung2and Feifang Hu3
1Zhejiang University, 2Chinese University of Hong Kong
and 3University of Virginia
Abstract: Urn models are popular and useful for adaptive designs in clinical studies.
Among various urn models, the drop-the-loser rule is an efficient adaptive treatment
allocation scheme, recently proposed for comparing different treatments in a clinical
trial. This rule is superior to other randomization schemes in terms of variability
and power. In this paper, the drop-the-loser rule is generalized to cope with more
popular and practical circumstances, including (1) delayed responses when test
results cannot be obtained immediately, (2) continuous responses, and (3) a pre-
specified target of allocation proportion. In addition, our proposed procedure has
several favorable asymptotic properties such as strong consistency and asymptotic
normality of the allocation proportions.
Key words and phrases: Asymptotic normality, asymptotic power, clinical trial,
delayed response, randomized play-the-winner rule, strong consistency.
1. Introduction
In clinical trials, patients usually accrue sequentially. One of the fundamen-
tal concerns treatment allocation. Which treatment should be assigned to the
next patient? The general consensus is that a randomization scheme should be
adopted to minimize selection bias and to provide a solid basis for statistical
inference. Adaptive designs can be valuable and ethical randomization schemes
that formulate treatment allocation as a function of previous responses. One
major objective of research in adaptive design is to develop treatment allocation
schemes, so that more patients receive the better treatment.
Pioneering works in the area of adaptive design can be traced to Thompson
(1933) and Robbins (1952). Since then, an unremitting generation of research
products in this area offers various approaches to treatment allocation schemes
applicable to clinical studies. For a discussion of recent developments in this
area, refer to Rosenberger (1996), Rosenberger and Lachin (2002), and references
therein.
Among different classes of adaptive designs, the one based on urn models
receives the most attention. Early works include Athreya and Karlin (1968),
388 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
Wei and Durham (1978) and Wei (1979). The basic idea is as follows: there
are various types of balls representing particular treatments; patients accrue se-
quentially; at each stage, the probability of allocating a particular treatment
to a patient depends on the numbers of various types of balls in the urn. The
response of each patient after treatment plays an essential role in the determi-
nation of subsequent urn compositions. The basic strategy is to “reward” more
balls to successful treatments. The multi-treatment randomized play-the-winner
rule (Andersen, Faries and Tamura (1994)) is an illustrative example. An urn
contains Kdifferent types of balls, representing Kdifferent treatments. When a
patient arrives, a ball is drawn at random with replacement. If it is a type iball,
the patient receives treatment i. A successful response to the treatment brings
an addition of a type iball to the urn. If the response is a failure, a ball is added
to the urn. This ball is partitioned according to the existing proportion of balls
for other treatments in the urn.
A sophisticated formulation of the urn model was given by Durham, Flournoy
and Li (1998). They derived a valuable randomized version of the generalized
P´olya urn that does not satisfy the regularity conditions of those studied by
Athreya and Ney (1972). One major feature of the randomized P´olya urn scheme
is to reward only successful treatments, balls are not added to the urn if the
treatment is a failure. Parallel ideas can be useful in other areas besides clini-
cal applications. For example, Beggs (2005) and Hopkins and Posch (2005) use
related urn concepts to model reinforcement learning in their study of economic
behaviors.
The importance of the randomized P´olya urn scheme is that it can be em-
bedded in the family of continuous-time pure birth processes with linear birth
rate (Yule processes). This enables the formulation of important limiting be-
haviors of the urn process (Ivanova and Flournoy (2001)). With the frame-
work of embedding the urn scheme in a continuous-time birth and death pro-
cess (Ivanova, Rosenberger, Durham and Flournoy (2000), Ivanova and Flournoy
(2001), Ivanova (2003)) constructed the drop-the-loser (DL) urn.
The DL rule differs from the randomized P´olya urn of Durham, Flournoy and
Li (1998). Instead of adding balls to reward successes, balls are removed when
failures are observed. In the urn, besides treatment balls, there are immigration
balls. When an immigration ball is selected, balls will be added to all types
(except immigration), preventing extinction of types of treatment balls. The
mechanism, and other properties of the DL rule, will be outlined in Section 2.
The DL rule was reported to have small variability and high statistical power
(Ivanova (2003)). One sensible objective of clinical studies is to increase the
power of treatment comparisons. Power depends heavily on the variability of
DROP-THE-LOSER URN FOR CLINICAL TRIALS 389
the treatment allocation scheme. Simulation evidence indicating the strong as-
sociation between power and variability can be found in Melfi and Page (1998),
as well as in Rosenberger, Stallard, Ivanova, Harper and Ricks (2001). A proof
in Hu and Rosenberger (2003) confirmed that average power of a randomization
procedure is a decreasing function of the variability of the randomization proce-
dure. Therefore, adaptive designs with smaller variability are much preferred.
Recently Hu and Rosenberger (2003) launched a comparative study of sev-
eral recent adaptive randomization procedures for binary responses: the se-
quential maximum likelihood procedure (SMLP) (Melfi and Page (2000)), the
doubly adaptive biased coin design (DBCD) (Eisele (1994)), the generalized
DBCD (Hu and Zhang (2004a)), the randomized play-the-winner (RPW) rule
(Wei and Durham (1978)), and the drop-the-loser (DL) rule (Ivanova (2003)).
Their study yielded results favoring the adoption of the DL rule due to its vari-
ability. For details, one can refer to Hu and Rosenberger (2003) and Hu, Rosen-
berger and Zhang (2006).
The DL rule has been shown to yield satisfactory results in terms of reducing
the number of failures and variability (Rosenberger and Hu (2004)). Neverthe-
less, it has limitations. First, there is a lack of clear methodology to cope with de-
layed test responses which are common in clinical studies. Second, the application
of the rule is limited to clinical trials with binary responses. Third, it can only
be applied to target one particular allocation proportion (Ivanova (2003)) while
different targets might be of interest in clinical studies (Rosenberger and Lachin
(2002)). In fact, there is a growing interest in target-based designs which are
derived with a pre-specified allocation target (see for example Eisele (1994),
Eisele and Woodroofe (1995), Melfi and Page (1998, 2000)).
We derive a generalized DL (GDL) rule that differs from other popular urn
models in its capability to handle delayed responses and to include pre-specified
targets. In Section 2, the DL rule and its major properties will be outlined. Then
the GDL rule is defined. Simulation results indicate that with delayed responses,
our proposed scheme performs reasonably well. In Section 3, asymptotic prop-
erties and variability comparisons are presented. Some general comments and
remarks are given in Section 4. When the responses are dichotomous, the GDL
rule is shown to be asymptotically most powerful. Proofs are given in the last
section. The main technique used in this paper involves the strong approximation
of a martingale, and is different from the techniques employed in Ivanova (2003)
and Ivanova et al. (2000). Furthermore, we show that the allocation process can
be approximated by a standard Wiener process. The asymptotic normality, the
rate of convergence and a law of the iterated logarithm are directly obtained from
this approximation.
390 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
2. The Generalized Drop-the-Loser Rule
In this section we first describe the drop-the-loser rule (Ivanova (2003)) and
its major statistical properties. Then our proposed generalized drop-the-loser
rule will be introduced.
2.1. Drop-the-loser rule
For explanatory purpose, assume that we have two treatments even though
the DL rule can be applied to multi-treatments. The DL rule is as follows.
Consider an urn containing three types of balls. Balls of types 1 and 2 rep-
resent treatments. Balls of type 0 are termed immigration balls. We start with
Z0,i balls of type i,i= 0,1,2. Let Z0= (Z0,0, Z0,1, Z0,2) be the initial urn com-
position. After mdraws, the urn composition becomes Zm= (Zm,0, Zm,1, Zm,2).
When a subject arrives, one ball is drawn at random. If a treatment ball of
type k(1 or 2) is selected, the kth treatment is given to the subject and the re-
sponse is observed. If it is a failure, the ball is not replaced, Zm+1,k =Zm,k −1,
Zm+1,j =Zm,j,j6=k. If the treatment is a success, the ball is replaced and
consequently, the urn composition remains unchanged, Zm+1 =Zm. If an immi-
gration ball (type 0) is selected, no subject is treated, and the ball is returned to
the urn together with two additional treatment balls, one of each treatment type.
Therefore, Zm+1,0=Zm,0and Zm+1,k =Zm,k + 1, k= 1,2. This procedure is
repeated until a treatment ball is drawn and the subject treated accordingly. The
function of the immigration ball is to avoid the extinction of a type of treatment
ball.
Let Pkbe the probability of success on treatment k, and Qk= 1 −Pk,
k= 1,2. Ivanova (2003) studied the properties of the DL rule by embedding
the urn composition process Zmin an immigration-death process. She defined
a two-dimensional process Z∗(t) = (Z∗
1(t), Z∗
2(t)), which is a collection of two
continuous-time linear immigration-death processes having common immigra-
tion processes with immigration rate Z0,0and independent death processes with
death rates Q1,Q2, such that Zm,k =Z∗
k(tm), k= 1,2. Here tmis the “time” of
the mth draw and it is the partial sum of a sequence of independent exponen-
tially distributed random variables with rate parameter 1. Note that trepresents
a “virtual” time instead of the real time. The embedding technique was devel-
oped by Athreya and Karlin (1968) and Athreya and Ney (1972) for the study of
the P´olya urn model. Later it was adopted by Durham, Flournoy and Li (1998),
Ivanova et al. (2000), Ivanova and Flournoy (2001) for studying sequential clini-
cal trials.
Now, let us state a couple of important asymptotic results of the DL rule.
Let Nk(t) be the number of trials on treatment kup to time t,k= 1,2. Ivanova
DROP-THE-LOSER URN FOR CLINICAL TRIALS 391
(2003) showed that
N1(t)
N1(t) + N2(t)
P
→v1:=
1
Q1
1
Q1+1
Q2
as t→ ∞,(2.1)
pN1(t) + N2(t) N1(t)
N1(t) + N2(t)−v1
!D
→N(0, σ2
DL) as t→ ∞, k = 1,2,(2.2)
where
σ2
DL =Q1Q2(P1+P2)
(Q1+Q2)3,(2.3)
is the asymptotic variance. The DL rule has two fundamental properties: (1) it
preserves the randomization ingredient of the randomized play-the-winner rule,
which yields a non-deterministic scheme; (2) when compared with many other
adaptive designs which have the same limit proportions of the DL rule, such as
the SMLP, the DBCD and the RPW rules, the DL rule generates an allocation
procedure with the minimum asymptotic variance, hence produces higher power
for the test of the difference of proportions (Hu and Rosenberger (2003)).
In practice, subjects frequently do not respond immediately. Therefore, the
response of an individual may not be available prior to the randomization of
the next subject. Delayed response is a scenario in clinical trials that deserves
much attention. Besides delayed response, the DL rule is incapable of dealing
with non-dichotomous responses. Basically, when the outcomes are delayed or
non-dichotomous, it is difficult to embed the sequence of urn compositions in an
immigration-birth-death process.
2.2. Generalized drop-the-loser rule
In this section, the GDL is outlined. The treatment allocation scheme is more
flexible than the DL rule and accommodates the possibility of delayed responses
and pre-assigned allocation proportion targets.
Similar to the DL rule, there are three types of balls in the urn. Balls of types
1 and 2 represent treatments, balls of type 0 are immigration balls. We start with
Z0,i (>0) balls of type i,i= 0,1,2. Let Z0= (Z0,0, Z0,1, Z0,2) be the initial urn
composition, and Zm= (Zm,0, Zm,1, Zm,2) be the urn composition after mdraws.
Let Z+
m,i = max(0, Zm,i), i= 0,1,2, and Z+
m= (Z+
m,0, Z+
m,1, Z+
m,2). When a
subject arrives to be allocated to a treatment, a ball is drawn at random according
to the urn composition Z+
mfor the appropriate m. That is, the probability of
selecting type iball is Z+
m,i/|Z+
m|, with |Z+
m|=Z+
m,0+Z+
m,1+Z+
m,2.
392 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
If an immigration ball (type 0) is drawn, no treatment is assigned and the
ball is returned to the urn along with aktype ktreatment balls, k= 1,2. Let
A=a1+a2where a1, a2>0. This step is repeated until a treatment ball is
drawn.
If a type k(k= 1,2) treatment ball is drawn, the subject is assigned to
treatment kand the ball is not replaced immediately. To allow for delayed
responses, the addition of balls is made after the subject’s response is observed.
We denote the outcome of this subject on treatment kby Ym,k. The outcome Ym,k
may not be available prior to the arrival of the next subject. In fact, the delayed
outcome may only be available after several subjects (a random variable) have
been allocated to treatments. After the response Ym,k is observed, Dm,k (≥0)
balls of type kare added to the urn.
We allow the urn to have a fractional or negative number of treatment balls.
According to the definition of Z+
m,i, the treatment balls with negative numbers
will never be selected. As a result, the number of treatment balls of each type
will not decrease when it is negative. So Zi,m ≥ −1 for all mand i.
Let Nn,k be the number of subjects assigned to treatment k,k= 1,2, after
the allocation of treatments to nsubjects. It is important to study the statistical
behavior of the proportions of patients Nn,k/n,k= 1,2, assigned to the two
treatments.
Let pk=E[Dm,k], k= 1,2. We assume 0 ≤pk<1 and qk= 1 −pk,k= 1,2.
Thus, after each treated subject, the expected number of balls added according
to the outcome observed is not larger than the number of outgoing balls (which
is 1).
The DL rule is a particular case of our GDL allocation scheme. For instance,
with dichotomous responses and two treatments, the DL rule corresponds to the
GDL rule with a1= 1, a2= 1 and Dm,k = 1 if the outcome of treatment kis a
success, and 0 otherwise. In addition, pk=Pk, the success probability of a trial
on treatment k,k= 1,2.
When the outcomes are not dichotomous, one may choose suitable adding
rules {Dm,k}to define a design. For example, the outcome of a patient after treat-
ment of cancer can be classified as “clinically ineffective”, “gradual improvement
with extended treatment” or “fully recovered”; one may define Dm,k = 1 if the
outcome is a “fully recovered”, Dm,k = Λ (0 <Λ<1) if the outcome is “grad-
ual improvement with extended treatment”, and Dm,k = 0 if the outcome is
“clinically ineffective”.
Under some suitable conditions (stated in Section 3), we can show that the
proportion of subjects assigned to treatment kis
Nn,k
n→vk:=
ak
qk
a1
q1+a2
q2
a.s. k = 1,2.(2.4)
DROP-THE-LOSER URN FOR CLINICAL TRIALS 393
With dichotomous outcomes, if a1=a2,Dm,k = 1 for success and Dm,k =
0 for failure when type ktreatment is assigned, the limiting proportions vk,
k= 1,2, are the same as in (2.1). One can choose a0
ks to adjust the allocation
proportions. By choosing a0
ks suitably, the GDL rule can be used to target any
desired allocation.
A more convenient approach to target a pre-specified allocation proportion
is to take Dm,k ≡0 for all mand k. Hence, qk= 1 −EDm,k ≡1. If the
target allocation proportions is vk(k= 1,2), we can simply define a design by
choosing ak=Cvkwhere Cis a constant and vkis a function of Pk. For example,
Rosenberger et al. (2001) studied the allocation proportions
√Pk
√P1+√P2
, k = 1,2,(2.5)
which minimize the expected number of failures under fixed variance of the esti-
mator of the treatment difference. In this case, we take
ak=C√Pk
√P1+√P2
, k = 1,2 (2.6)
and the balls are added only through immigration. The superior treatment (the
one with larger probability of success) will be rewarded more balls each time an
immigration ball is selected. Simulation study in the following section indicates
that there is no significant difference among various choices of C.
Remark 2.1. In practice, the Pkare usually unknown. In these cases, simply
substitute b
Pkfor Pk, where b
Pkis the current estimate of Pk,k= 1,2. We propose
the estimate
b
Pk=(number of observed successes on treatment k) + 1
(number of observed outcomes on treatment k) + 2 ,
which is the Bayesian estimate of Pkwith a uniform prior distribution, k= 1,2.
Variously, one can replace 1 in the numerator by αand 2 in the denominator by
α+βif the beta distribution beta(α, β ) is employed as the prior distribution,
with the constants αand βestimated from earlier trials.
2.3. Simulation results
In this section, a simulation study is performed to investigate the perfor-
mance of our allocation scheme. Two different allocation targets, (2.1) and (2.5),
are employed as our study cases. Given treatments 1 and 2 with success prob-
abilities P1and P2respectively, our simulation study is performed with P1and
P2being selected with reference to those choices of Hu and Rosenberger (2003).
For the allocation process, b
Pkgiven in Remark 2.1 is utilized.
394 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
For both the delayed times for the two treatments and the patient entry
times, exponential distributions are used. The mean parameters of the delay
times for treatments 1 and 2 are λ1and λ2respectively. For patient entry times,
the mean parameter is λ3. There are three different configurations for the mean
parameters. The first one corresponds to the case where there are no delayed
responses. The second one corresponds to (λ1, λ2, λ3) = (1,1,1), which represents
similar delayed times for the responses of the two treatments. Finally, we select
(λ1, λ2, λ3) to be (5,1,1) to represent a large difference in delayed times for the
responses of the two treatments. As explained earlier, for simplicity we pick
Dm,k = 0 for the GDL rules.
The number of subjects nis chosen to be 100 and 500. The number of
replications in our simulation study is 10,000. The proportions of subjects being
allocated to treatment 1, Nn,1/n, are tabulated in Tables 1 and 2, since Nn,2/n
is simply 1 −Nn,1/n.
Table 1. Simulated allocation proportion (Nn,1/n) of DL rule, GDL rule and
DBCD with allocation target v1given in (2.1).
DL GDL (1) DBCD
p1,p2v1n= 100 n= 500 n= 100 n= 500 n= 100 n= 500
Immediate Response
0.8, 0.8 0.50 0.50(0.069) 0.50(0.041) 0.50(0.102) 0.50(0.058) 0.50(0.103) 0.50(0.049)
0.8, 0.6 0.67 0.62(0.060) 0.66(0.031) 0.63(0.079) 0.66(0.042) 0.65(0.076) 0.66(0.037)
0.7, 0.5 0.63 0.60(0.053) 0.62(0.026) 0.60(0.067) 0.62(0.035) 0.62(0.065) 0.62(0.030)
0.5, 0.5 0.50 0.50(0.047) 0.50(0.022) 0.50(0.057) 0.50(0.029) 0.50(0.056) 0.50(0.026)
0.5, 0.2 0.62 0.61(0.035) 0.61(0.016) 0.60(0.042) 0.61(0.021) 0.61(0.043) 0.61(0.020)
0.2, 0.2 0.50 0.50(0.025) 0.50(0.011) 0.50(0.030) 0.50(0.015) 0.50(0.034) 0.50(0.016)
(λ1, λ2, λ3) = (1, 1, 1)
0.8, 0.8 0.50 0.50(0.066) 0.50(0.041) 0.50(0.099) 0.50(0.057) 0.50(0.103) 0.50(0.049)
0.8, 0.6 0.67 0.62(0.058) 0.66(0.031) 0.63(0.078) 0.66(0.042) 0.65(0.076) 0.66(0.037)
0.7, 0.5 0.63 0.60(0.052) 0.62(0.026) 0.60(0.066) 0.62(0.034) 0.62(0.065) 0.62(0.030)
0.5, 0.5 0.50 0.50(0.046) 0.50(0.022) 0.50(0.058) 0.50(0.029) 0.50(0.056) 0.50(0.026)
0.5, 0.2 0.62 0.61(0.035) 0.61(0.016) 0.60(0.041) 0.61(0.021) 0.61(0.043) 0.61(0.020)
0.2, 0.2 0.50 0.50(0.025) 0.50(0.011) 0.50(0.030) 0.50(0.015) 0.50(0.034) 0.50(0.016)
(λ1, λ2, λ3) = (5, 1, 1)
0.8, 0.8 0.50 0.47(0.060) 0.49(0.040) 0.49(0.099) 0.50(0.057) 0.50(0.104) 0.50(0.049)
0.8, 0.6 0.67 0.59(0.055) 0.65(0.030) 0.63(0.077) 0.66(0.042) 0.65(0.078) 0.66(0.037)
0.7, 0.5 0.63 0.58(0.049) 0.62(0.026) 0.60(0.066) 0.62(0.035) 0.61(0.066) 0.63(0.030)
0.5, 0.5 0.50 0.50(0.045) 0.50(0.022) 0.50(0.056) 0.50(0.029) 0.50(0.057) 0.50(0.026)
0.5, 0.2 0.62 0.60(0.033) 0.61(0.016) 0.60(0.042) 0.61(0.021) 0.61(0.044) 0.61(0.020)
0.2, 0.2 0.50 0.50(0.025) 0.50(0.011) 0.50(0.030) 0.50(0.015) 0.50(0.035) 0.50(0.016)
Simulated standard deviations are given in parentheses.
GDL(1):a1= 2v1, a2= 2(1 −v1)
DROP-THE-LOSER URN FOR CLINICAL TRIALS 395
Table 2. Simulated allocation proportion (Nn,1/n) of two GDL rules and
DBCD with allocation target v1given in (2.5)
GDL (2) GDL (3) DBCD
p1,p2v1n= 100 n= 500 n= 100 n= 500 n= 100 n= 500.
Immediate Response
0.8, 0.8 0.50 0.50(0.019) 0.50(0.008) 0.50(0.018) 0.50(0.008) 0.50(0.027) 0.50(0.012)
0.8, 0.6 0.54 0.53(0.023) 0.54(0.011) 0.53(0.023) 0.54(0.011) 0.54(0.030) 0.54(0.013)
0.7, 0.5 0.54 0.54(0.028) 0.54(0.013) 0.54(0.028) 0.54(0.013) 0.54(0.033) 0.54(0.014)
0.5, 0.5 0.50 0.50(0.032) 0.50(0.015) 0.50(0.032) 0.50(0.016) 0.50(0.036) 0.50(0.017)
0.5, 0.2 0.61 0.59(0.042) 0.61(0.024) 0.59(0.042) 0.61(0.024) 0.61(0.049) 0.61(0.022)
0.2, 0.2 0.50 0.50(0.051) 0.50(0.029) 0.50(0.051) 0.50(0.029) 0.50(0.058) 0.50(0.026)
(λ1, λ2, λ3) = (1, 1, 1)
0.8, 0.8 0.50 0.50(0.018) 0.50(0.008) 0.50(0.018) 0.50(0.008) 0.50(0.027) 0.50(0.012)
0.8, 0.6 0.54 0.53(0.023) 0.54(0.011) 0.53(0.022) 0.54(0.011) 0.54(0.030) 0.54(0.013)
0.7, 0.5 0.54 0.54(0.028) 0.54(0.013) 0.54(0.027) 0.54(0.013) 0.54(0.033) 0.54(0.015)
0.5, 0.5 0.50 0.50(0.032) 0.50(0.015) 0.50(0.032) 0.50(0.015) 0.50(0.036) 0.50(0.016)
0.5, 0.2 0.61 0.59(0.042) 0.61(0.024) 0.59(0.042) 0.61(0.024) 0.61(0.049) 0.61(0.022)
0.2, 0.2 0.50 0.50(0.051) 0.50(0.029) 0.50(0.051) 0.50(0.029) 0.50(0.058) 0.50(0.026)
(λ1, λ2, λ3) = (5, 1, 1)
0.8, 0.8 0.50 0.50(0.019) 0.50(0.008) 0.50(0.017) 0.50(0.008) 0.50(0.027) 0.50(0.012)
0.8, 0.6 0.54 0.53(0.023) 0.53(0.011) 0.53(0.023) 0.53(0.011) 0.54(0.030) 0.54(0.013)
0.7, 0.5 0.54 0.54(0.028) 0.54(0.013) 0.54(0.028) 0.54(0.013) 0.54(0.033) 0.54(0.015)
0.5, 0.5 0.50 0.50(0.031) 0.50(0.015) 0.50(0.032) 0.50(0.015) 0.50(0.037) 0.50(0.016)
0.5, 0.2 0.61 0.59(0.042) 0.61(0.024) 0.59(0.041) 0.61(0.024) 0.61(0.049) 0.61(0.022)
0.2, 0.2 0.50 0.50(0.050) 0.50(0.029) 0.50(0.052) 0.50(0.029) 0.50(0.058) 0.50(0.026)
Simulated standard deviations are given in parentheses.
GDL (2): a1= 2v1, a2= 2(1 −v1)
GDL (3): a1= 2(√p1+√p2)(v1) = 2√p1, a2= 2(√p1+√p2)(1 −v1) = 2√p2
For comparison purposes, the DBCD is also included. The allocation scheme
used in this simulation study follows that of Rosenberger and Hu (2004) closely.
In addition their suggested value of 2, for the parameter that determines the
variability of the allocation proportions arising from the randomized procedure,
is adopted.
For Table 1, the allocation target given in (2.1) is used. Even though the
DL rule was not designed for delayed responses, for exploratory purposes it is
included in the cases with delayed responses. A simplistic approach is adopted.
When a treatment ball is chosen, the action of whether to return the ball or
not is deferred until the response is observed. We have the following findings.
For large sample sizes (n= 500) and/or without delayed responses, both the
DL rule and the GDL rules are able to provide allocation proportions very close
to the target. For smaller sample sizes (n= 100) and delayed responses, the
396 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
DL rule is outperformed by the GDL rule and the DBCD, especially when both
treatments have high success rates (example: P1= 0.8, P2= 0.6). Note that
the variances of the GDL rule and the DBCD are slightly larger due to the
requirement of estimating P1and P2at each stage when an immigration ball is
selected. In return, these estimates provide precise estimates of the efficacies of
the treatments, especially when delayed responses are present. This also explains
why the GDL rule and the DBCD surpass the DL rule in terms of the convergence
of the allocation proportions in such cases.
In Table 2, the optimal allocation target in (2.5) is used. All allocation
proportions are quite close to the pre-specified target. In addition, the two
choices of Cfor the immigration rates, a1and a2, which represent the addition
of roughly two treatment balls when an immigration ball is selected, do not yield
much differences in terms of the allocation proportions. In fact several other
possible values of Cwere tried and, as long as the number of balls added to the
urn remained less than 4, similar results were obtained and hence not reported.
The immigration ball has two important functions: the first is to prevent the
possibility of extinction of a particular type of treatment ball; the second is to
add treatment balls to the urn according to the current estimates of P1and P2.
Therefore, to allow the immigration ball to play these two roles continuously
during the allocation process, the principle is not to add so many treatment balls
to the urn that the chance of selecting an immigration ball becomes too small.
Simulation results in Table 2 also reveal that the DBCD’s performance is
comparable to the GDL rule. The DBCD has an infinitesimal advantage in
accuracy in attaining the target allocation, but has slightly larger variances for
n= 100. However a complete theoretical justification of DBCD with delayed
responses, similar to the one provided for the GDL rule in this paper, is still
unavailable.
Finally, the use of the Bayesian estimates of Pk(k= 1,2) works very well.
We have also computed the final estimates of the success probabilities, and these
are always close to the actual values.
3. Asymptotic Properties of the GDL Rule
In this section several useful asymptotic properties for the GDL rule are
given. We consider only the case in which the numbers of the immigrated balls
ak,k= 1,2, are fixed. More complicated scenarios in which aks vary from time
to time are an interesting topic for future study.
Now, let tmbe the entry time of the mth subject. Assume that {tm+1 −
tm;m≥1}is a sequence of independent and identically distributed random
variables. The response time of the mth subject with treatment kis denoted by
DROP-THE-LOSER URN FOR CLINICAL TRIALS 397
rm(k). Suppose {rm(k); m≥1}are sequences of independent random variables,
k= 1,2. Further, let the response times be independent of the entry times. We
also assume that the draw, removal and addition of balls requires no time, and
so the mth subject is randomized at time tm. For the response time rm(k), we
have the following assumption.
Assumption 3.1. Let δk(m, n) = I{rm(k)> tm+n−tm}be an indicator function
that takes the value 1 if the outcome of the mth subject on treatment koccurs
after at least another nsubjects are randomized, and 0 otherwise. Suppose
for some constants C > 0 and γ > 2, µk(m, n) = P{δk(m, n) = 1} ≤ Cn−γ,
m, n = 1,2,...,k= 1,2.
Since the above probability is a decreasing function of nwith a power rate,
the chance is slim that too many patients arrive before a delayed response is
observed.
Remark 3.1. For generalized Friedman’s urn models (also known as generalized
P´olya urn models) with delayed responses, the assumptions of delayed time have
been discussed by Bai, Hu and Rosenberger (2002) and Hu and Zhang (2004b)).
Similar to the arguments in Bai, Hu and Rosenberger (2002), we can show that
Assumption 3.1 is satisfied if (i) the γth moment of rm(k) exists and (ii) E(tm+1 −
tm)>0 and E|tm+1 −tm|2γ<∞. These two conditions can be easily verified in
applications.
Assumption 3.2. {Dm,k;m≥1},k= 1,2, are two sequences of i.i.d. random
variables with 0 ≤pk=E[Dm,k ]<1 and E|Dm,k|p<∞for any p > 0, k= 1,2.
Let σ2
k= Var (Dm,k) be the variance of the adding rules and qk= 1 −pk,
k= 1,2.
Theorem 3.1. Suppose Assumptions 3.1and 3.2are satisfied. Let vk,k= 1,2,
be defined in (2.4). Then there exists a standard Brownian motion {W(t); t≥
0}such that for any δ > 0,Nn,1−nv1=σ W (n) + o(n(γ+1)/(3γ)+δ)a.s., and
Nn,2−nv2=−σW (n) + o(n(γ+1)/(3γ)+δ)a.s., where
σ2=a1a2(a2q2σ2
1+a1q1σ2
2)
(a2q1+a1q2)3.(3.1)
The proof of the theorem will be given in the last section. By the properties
of Wiener processes, the following is an immediate corollary of the theorem.
Corollary 3.1. Under Assumptions 3.1and 3.2,
Nn,k
n−vk=O rlog log n
n!a.s., k = 1,2,(3.2)
398 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
√nNn,k
n−vkD
→N(0, σ2), k = 1,2,(3.3)
where vk,k= 1,2, are defined in (2.4), and σ2is defined in (3.1).
Equation (3.2) gives strong consistency with its rate of convergence for the
proportions Nn,k/n,k= 1,2. Comparing with (2.2), where the result is given
through a “virtual” time, (3.3) provides the direct asymptotic distributions of
the proportions. The asymptotic distributions and the asymptotic variance can
be used to compare with other adaptive designs (Hu and Rosenberger (2003)).
Remark 3.2. From Corollary 3.1, the asymptotic properties of the GDL process
does not depend on the delayed mechanism as long as Assumption 3.1 is satisfied.
However, in Theorem 3.1, the convergence rate of the error depends on γwhich
is affected by the degree of the delayed responses.
Example 3.1. Binary response: Based on the result of Hu, Rosenberger and
Zhang (2006), we can calculate the lower bound of the asymptotic variance for
the allocation proportion v1given as in (2.4). For the case with dichotomous
outcomes, σ2
k=pkqk,k= 1,2. Here pk=Pk,qk= 1 −Pk, and Pkis the
probability of a success on treatment k,k= 1,2. Let p= (p1, p2) and
f(y) =
a1
1−y1
a1
1−y1+a2
1−y2
.
According to Theorem 1 of Hu, Rosenberger and Zhang (2006), the lower bound
of the asymptotic variance is
σ2
min(p) := ∂f
∂ypI−1(p)∂f
∂yp0,
where I(p) = diag(v1
p1q1,v2
p2q2) is Fisher’s information matrix. Taking derivatives
of fwe find that
∂f
∂yp=∂f
∂y1p,∂f
∂y2p=−v1v2
q1
,v1v2
q2.
It follows that σ2
min(p) = σ2by some elementary calculation, where σ2is defined
in (3.1). Based on Corollary 3.1, the GDL rule attains this lower bound and
hence it is asymptotically the most powerful design.
When a1=a2,Dm,k = 1 for success and Dm,k = 0 for failure when type
ktreatment is assigned, the GDL rule becomes the DL rule. The asymptotic
variance, nV ar(Nn,1/n), is
σ2
DL =q1q2(p1+p2)
(q1+q2)3=Q1Q2(P1+P2)
(Q1+Q2)3,
which is the smallest among all the adaptive designs considered in Hu and Rosen-
berger (2003).
DROP-THE-LOSER URN FOR CLINICAL TRIALS 399
4. Discussion
One important application of the GDL rule in clinical studies is that it can
be used for continuous responses. For instance, we may apply the GDL rule
to the example studied in Section 8 of Eisele and Woodroofe (1995), where the
responses are normally distributed and the desired target proportion is the pop-
ular Neyman allocation. Similar to Remark 2.1, we can choose the akand Dm,k
sequentially to target the desired proportion. It would be interesting to com-
pare this design with the doubly adaptive biased coin designs (Hu and Zhang
(2004a)) in which the allocation probabilities are functions of sequential estima-
tors of unknown parameters, and the sequential estimation-adjusted urn models
(Zhang, Hu and Cheung (2006)). However, when the akdepend on the process
(as indicated in Remark 2.1), the asymptotic properties of the allocation propor-
tion Nn,1/n are unknown. This is an interesting future research topic.
For the randomized play-the-winner rule with delayed responses, Wei (1988)
suggested updating the urn when responses become available. For a generalized
Friedman’s urn model (the randomized play-the-winner rule is a special case) with
delayed responses, the limiting distribution of the urn composition was derived
in Bai, Hu and Rosenberger (2002). Further, Hu and Zhang (2004b) obtained
the limiting distribution of the allocation proportion. Both papers showed that
the delayed responses do not affect the asymptotic properties of the generalized
Friedman’s urn model. Here we obtain similar results for the GDL rule. Never-
theless, the arguments are only valid in the context of large samples. In practice,
the delayed mechanism is important and should not be ignored, as indicated by
our simulation findings.
5. Proofs
Theorem 3.1 is proved in this section. Recall that Zn= (Zn,0, Zn,1, Zn,2)
represents the numbers of balls after ndraws and |Z+
n|=Z+
n,0+Z+
n,1+Z+
n,2.
Because every immigration ball is replaced, Z+
n,0=Zn,0=Z0,0for all n. Let
Xnbe the result of the nth draw, where Xn,k = 1 if the selected ball is of type
kand Xn,k = 0 otherwise, k= 0,1,2. Further, let N∗
n= (N∗
n,0, N ∗
n,1, N ∗
n,2) =
Pn
m=1 Xm, so N∗
n,k is the number of selected type kballs in the first ndraws.
Let un= max{m:N∗
m,1+N∗
m,2≤n}. Then unis the total number of draws of
treatment type balls in the first nassignments, and Nn,k =N∗
un,k,k= 1,2.
Let Ik(m, n) be the indicator function, which takes value 1 if the outcome
Ym,k on treatment kof the subject assigned at the mth draw occurs after the
(m+n)th draw and before the (m+n+ 1)th draw, k= 1,2. Remember that,
when Ym,k occurs, we add Dm,k =D(Ym,k) balls of type kinto the urn. So, for
given mand n, if Ik(m, n) = 1, we add Xm,kDm,k balls of type kto the urn.
400 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
Consequently, if m= 0, Ik(n−m, m) = Ik(n, 0) and the outcome on treatment
kassigned at time noccurs after the nth draw and before the (n+ 1)th draw;
...; if m=n−1, Ik(n−m, m) = Ik(1, n −1) and the outcome on treatment
kassigned at time 1 occurs after the nth draw and before the (n+ 1)th draw.
Hence, after the nth draw and before the (n+ 1)th draw, the numbers of balls
of each type added according to the outcomes are
Wn,k =
n−1
X
m=0
Ik(n−m, m)Xn−m,kDn−m,k
=
n
X
m=1
Ik(m, n −m)Xm,k Dm,k , k = 1,2.
The change in the number of type kballs after ndraws from the time of the
(n−1)th draw is Zn,k −Zn−1,k =akXn,0−Xn,k +Wn,k,k= 1,2. Recall that
akhere is the number of added type kballs when an immigration ball is drawn.
So, for k= 1,2, the number of type kballs added after ndraws is
Zn,k −Z0,k =ak
n
X
j=1
Xj,0−
n
X
j=1
Xj,k +
n
X
j=1
Wj,k
=ak
n
X
m=1
Xm,0−
n
X
m=1
Xm,k +
n
X
m=1
n
X
j=m
Xm,kDm,k Ik(m, j −m)
=ak
n
X
m=1
Xm,0−
n
X
m=1
Xm,k +
n
X
m=1
∞
X
j=m
Xm,kDm,k Ik(m, j −m)
−
n
X
m=1
∞
X
j=n+1
Xm,kDm,k Ik(m, j −m)
=ak
n
X
m=1
Xm,0+
n
X
m=1
Xm,k(Dm,k −1) −
n
X
m=1
∞
X
j=n+1
Xm,kDm,k Ik(m, j −m)
=: ak
n
X
m=1
Xm,0+
n
X
m=1
Xm,k(Dm,k −1) −Rn,k .(5.1)
That is
∆Zn,k =akXn,0+Xn,k(Dn,k −1) −∆Rn,k , k = 1,2,
(5.2)
where ∆ denotes the differencing operand of a sequence {zn}. From (5.1), it
DROP-THE-LOSER URN FOR CLINICAL TRIALS 401
follows that
Zn,k −Z0,k =akN∗
n,0−qkN∗
n,k +
n
X
m=1
Xm,k(Dm,k −E[Dm,k ]) −Rn,k
=: akN∗
n,0−qkN∗
n,k +Mn,k −Rn,k, k = 1,2.(5.3)
We prove Theorem 3.1 by showing that Rn,k and Zn,k can be neglected, and the
major term Mn,k can be approximated by a Wiener process. Notice that Zn,k is
a function of {Ik(m, j)}. We show that Zn,k can be neglected by using the fact
that E[Ik(n, j)] decays very rapidly. So we first replace Assumption 3.1 by the
following one on Ik(n, j).
Condition A. For some ϕ > 1, P∞
j=nE[Ik(m, j)] ≤C n−ϕ, for all n,mand
k= 1,2.
The summation in Condition A is the probability of the event that the subject
who is assigned to treatment kat the mth draw responds after at least another n
draws, and it is required that this probability decays with a power rate, similar
to Assumption 3.1. The following claim provides the connection.
Claim. Assumption 3.1 implies Condition A with ϕ=γ−1−for any > 0.
Proof. Let N∗
m=N∗
m,1+N∗
m,2. Notice that E[Ik(m, n)] is the probability of
the event that the N∗
mth subject (who is assigned after the mth ball is drawn)
on treatment kresponds after the (m+n)th draw and before the (m+n+ 1)th
draw. So E[Ik(m, n)|N∗
m=p]≤P(E1|N∗
m=p) + P(E2|N∗
m=p), where E1is the
event that the pth subject responds after at least another n1−subjects arrive,
and E2is the event that there at least n−n1−draws of type 0 balls from the
mth draw to the (m+n)th draw. The event E1depends only on the response
time of the pth subject and the waiting times for future subjects. However,
the event {N∗
m=p}depends only on past draws and assignments. So, E1and
{N∗
m=p}are independent. It follows that P(E1|N∗
m=p) = P(E1)≤Cn−γ(1−)
by Assumption 3.1. For P(E2|N∗
m=p), we consider the event E3that the largest
run of “1”s in Xm,0,...,Xm+n,0is at least n. Notice that, for event EC
3, there
are at least n/nzeros in Xm,0,...,Xm+n,0, and then at most n−n/nones. So,
E2does not occur. It follows that P(E2|N∗
m=p)≤P(E3|N∗
m=p). Hence we
conclude that
E[Ik(m, n)] ≤Cn−γ(1−)+P(E3)
≤Cn−γ(1−)+
m+n
X
i=m
P{Xi,0=···=Xi+[n],0= 1}.
402 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
On the other hand,
P{Xi,0=···=Xi+[n],0= 1}
=EhI{Xi,0=···=Xi+[n]−1,0= 1}P[Xi+[n],0= 1|Fi+[n]−1]i
=EhI{Xi,0=···=Xi+[n]−1,0= 1}Z0,0
|Z+
i+[n]−1|i
≤P{Xi,0=···=Xi+[n]−1,0= 1}Z0,0
Z0,0+A([n]−1) ,
since at each stage from stage ito i+ [n]−1 at least A=a1+a2balls are
added to the run and no ball is removed, where Fn=σ(X1,...,Xn,Y1,...,Yn)
is the history sigma field. Here and in the remainder of this paper, we take
Yn= (Yn,1,...,Yn,K), n≥1. So,
P{Xi,0=···=Xi+[n],0= 1} ≤
[n]
Y
j=1
Z0,0
Z0,0+A(j−1) ≤Cexp{−n}.
It follows that E[Ik(m, n)] ≤Cn−γ(1−)+Cn exp{−n} ≤ Cn−γ(1−). Hence
P∞
j=nE[Ik(m, j)] ≤C n−γ(1−)+1.
The next lemma gives the convergence rate of the remainders Rn,k,k= 1,2.
Lemma 5.1. Assume E[|Dm,k |p]<∞for any p > 0and Condition Ais satisfied.
Then for any δ > 0, we have
Ehmax
m≤n|Rm,k|i=on1
(ϕ+1)+δ, k = 1,2,(5.4)
|Rn,k|=on1
(ϕ+1)+δa.s., k = 1,2.(5.5)
Proof. (5.5) is implied by (5.4) if we notice that
∞
X
i=1
Pmax
2i≤n≤2i+1 |Rn,k|
n1
(ϕ+1)+2δ≥≤C
∞
X
i=1
2−iδ <∞.
Now we need to verify (5.4). Fix k. For any 1 ≤i≤n,
|Ri,k|=
i
X
m=1
∞
X
j=i−m+1
Xm,kIk(m, j )Dm,k
≤
i
X
m=1
∞
X
j=i−m+1
Ik(m, j)|Dm,k |I{|Dm,k| ≤ nδ
3}+
i
X
m=1 |Dm,k|I{|Dm,k |> nδ
3}
≤nδ
3
i
X
m=1
∞
X
j=i−m+1
Ik(m, j)+
n
X
m=1 |Dm,k|I{|Dm,k |> nδ
3}.
DROP-THE-LOSER URN FOR CLINICAL TRIALS 403
The expectation of the second term does not exceed
nE[|D1,k|I{|D1,k |> nδ
3}]≤n1−δp
3E[|D1,k|p]≤C
whenever p≥3/δ. So, it is enough to show that
Ehmax
i≤nRi,ki=On1
(ϕ+1) +δ
2,(5.6)
where Ri,k =Pi
m=1 P∞
j=i−m+1 Ik(m, j). Let 1 ≤P≤nbe an integer whose
value will be specified later. Then Ri,k ≤Pif i≤P. For P≤i≤n,
Ri,k =
i
X
m=i−P+1
∞
X
j=i−m+1
Ik(m, j) +
i−P
X
m=1
∞
X
j=i−m+1
Ik(m, j)
≤P+
i−P
X
m=1
∞
X
j=P
Ik(m, j)≤P+
n
X
m=1
∞
X
j=P
Ik(m, j).
It follows that E[maxi≤nRi,k]≤P+Pn
m=1 P∞
j=PE[Ik(m, j)] ≤P+C nP −ϕ.
Choosing P= [n1/(ϕ+1)+δ/2] yields (5.6).
Lemma 5.2. Let Fn=σ(X1,...,Xn,Y1,...,Yn). Let Vn,0=Pn
m=1(Xm,0−
E[Xm,0|Fm−1]) and Vn,k =Pn
m=1{Xm,k (Dm,k −1) −E[Xm,k (Dm,k −1)|Fn−1]},
k= 1,2. Assume E[|Dm,k|p]<∞for p≥2. Then there exists a constant Cp>0
such that the martingales {Yn,k ,Fn;n≥1},k= 0,1,2, satisfy
Ehmax
i≤n|Vm+i,k −Vm,k|pi≤Cpnp
2for all mand n, k = 0,1,2.(5.7)
Proof. Notice that |∆Vn,0| ≤ 1 and
Eh|∆Vn,k|pFn−1i≤2p−1(1 + E[|Dn,k|p]) ≤Cp, k = 1,2.
(5.7) follows from the Rosenthal type inequality.
Let Un,k =akVn,0+Vn,k,k= 1,2. Un,k is the sum of conditionally centered
changes in number of type kballs in the first ndraws, k= 1,2. It can be shown
that {Un,k,Fn;n≥1}is a martingale satisfying a similar inequality as (5.7),
k= 1,2. The next lemma gives the convergence rate of the urn proportions Zn.
Lemma 5.3. Under Assumption 3.2and Condition A, for each k= 1,2and
any δ > 0,
max
j≤nZj,k ≤Z0,k ∨akZ0,0
qk
+ 2 max
j≤n|Uj,k|+ max
j≤n|Rj,k|,(5.8)
404 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
E|Zn,k|=on1
ϕ+1 +δ,(5.9)
max
j≤n|Zj,k|=on1
3+1
3ϕ+3 +δin L1,(5.10)
Zn,k =on1
3+1
3ϕ+3 +δa.s.. (5.11)
Proof. According to (5.2), it is obvious that
Zn,k =Zn−1,k +akZ+
n−1,0−qkZ+
n−1,k
|Z+
n−1|+ ∆Un,k −∆Rn,k
=Zn−1,k +akZ0,0−qkZ+
n−1,k
|Z+
n−1|+ ∆Un,k −∆Rn,k.(5.12)
Then
Zn,k ≤Zn−1,k + ∆Un,k −∆Rn,k,if Zn−1,k ≥ak
Z0,0
qk
; (5.13)
Zn,k ≤Zn−1,k +ak+ ∆Un,k −∆Rn,k,if Zn−1,k < ak
Z0,0
qk
.
Let Sn= max{1≤j≤n:Zj,k < akZ0,0/qk}, where max(∅) = 0. Then,
according to (5.13),
Zn,k ≤Zn−1,k + ∆Un,k −Rn,k +Rn−1,k ≤ ···
≤ZSn,k + ∆USn+1,k +···+ ∆Un,k −Rn,k +RSn,k
≤Z0,k ∨nak
Z0,0
qko+Un,k −USn,k −Rn,k +RSn,k
≤Z0,k ∨nak
Z0,0
qko+Un,k −USn,k + max
m≤n|Rm,k|.(5.14)
(5.8) is proved. Notice that Sn≤nis a stopping time. It follows that EUn,k =
EUSn,k. By (5.4) and (5.14) we conclude that EZn,k ≤o(n1/(ϕ+1)+δ). (5.9) is
proved by the fact that Zn,k ≥ −1 and |Zn,k|=Zn,k + 2Z−
n,k.
Next, we verify (5.11). Fix m. By replacing Zj,k with Zm+j,k in the definition
of the stopping time Sn, with similar arguments as in showing (5.8) we can show
that
max
0≤i≤nZi+m,k ≤Zm,k ∨akZ0,0
qk
+ 2 max
0≤i≤n|Um+i,k −Um,k|+ max
j≤n+m|Rj,k|.(5.15)
DROP-THE-LOSER URN FOR CLINICAL TRIALS 405
Now, for each p≥2 and 0 <t<1/2, if n≥1/(4t), then by (5.7), (5.9) and
(5.15),
Emax
m≤nZm,k ≤Ehmax
imax
i[nt]≤m≤(i+1)[nt]Zm,ki
≤ak
Z0,0
qk
+X
i
Eh|Zi[nt],k|i+Ehmax
j≤n|Rj,k|i
+ 2Ehmax
imax
i[nt]≤m≤(i+1)[nt]|Um,k −Ui[nt],k|i
≤Cnt−1n1
ϕ+1 +δ+n1
ϕ+1 +δo+ 2Ehmax
imax
i[nt]≤m≤(i+1)[nt]|Um,k −Ui[nt],k|pi1
p
≤Cnt−1n1
ϕ+1 +δ+n1
ϕ+1 +δo+ 2X
i
Ehmax
i[nt]≤m≤(i+1)[nt]|Um,k −Ui[nt],k|pi1
p
≤Cnt−1n1
ϕ+1 +δ+n1
ϕ+1 +δ+X
i
([nt])p
21
po
≤Cnt−1n1
ϕ+1 +δ+t1
2−1
pn1
2o,
where the sums and maximums are taken over {i≥0 : i[nt]≤n}. Here C > 0
is a constant and does not depend on tand n. Notice that Zm,k ≥ −1. If
t=n−1/3+2/(3ϕ+3), we have
Ehmax
m≤n|Zm,k|i≤2 + Ehmax
m≤nZm,ki≤C n1
3+1
3(ϕ+1) +δ+Cn1
3+1
3(ϕ+1) +1
p(1
3−2
3(ϕ+1) ).
Choosing psuch that 1/p(1/3−2/[3(ϕ+ 1)]) ≤δyields (5.10) immediately.
With the same arguments as in showing (5.4) from (5.7), (5.11) can be derived
easily from (5.10) and the Borel-Cantelli Lemma. The proof of the lemma is now
complete. (5.11) and (5.4) indicates that the terms Zn,k and Rn,k in (5.3) can be
neglected.
Now we begin the proof of Theorem 3.1. Let s=a1/q1+a2/q2. Then
vk= (ak/qk)/s,k= 1,2. Let An=σ(X1,...,Xn,Xn+1,Y1,...,Yn), Mn,k =
Pn
m=1 Xm,k(Dm,k −EDm,k ), k= 1,2. Then {(Mn,1, Mn,2),An;n≥1}is a
martingale with
n
X
m=1
Eh(∆Mm,k)2|Am−1i=
n
X
m=1
Xm,kVar (Dm,k) = N∗
n,kσ2
k,(5.16)
E[∆Mn,k ·∆Mn,j|An−1] = 0, j6=k, and E[|∆Mn,k|p|An−1]≤2pE[|D1,k |p]. Ac-
cording to the law of the iterated logarithm for martingales, we have
Mn,k =O(pnlog log n)a.s..
(5.17)
406 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
Combining (5.3), (5.5), (5.11) and (5.17) yields that, for any δ > 0,
akN∗
n,0−qkN∗
n,k =−Mn,k +on1
3+1
3ϕ+3 +δ
2
=−Mn,k +onγ+1
3γ+δ(5.18)
=O(pnlog log n)a.s., k = 1,2.
Together with the fact that N∗
n,0+N∗
n,1+N∗
n,2=n, we have
N∗
n,k =n
ak
qk
a1
q1+a2
q2+ 1 +O(pnlog log n)
=ns
s+ 1vk+O(pnlog log n)a.s., k = 1,2,(5.19)
a1q2N∗
n,2−a2q1N∗
n,1=a2(a1N∗
n,0−q1N∗
n,1)−a1(a2N∗
n,0−q2N∗
n,2)
=a1Mn,2−a2Mn,1+onγ+1
3γ+δa.s.. (5.20)
We consider the martingale {Mn=: a1Mn,2−a2Mn,1}. From (5.16) and (5.19),
it follows that
n
X
m=1
Eh(∆Mm)2|Am−1i
=
n
X
m=1
a2
1Eh(∆Mm,2)2|Am−1i+a2
2
n
X
m=1
Eh(∆Mm,1)2|Am−1i
=ns
s+ 1(a2
1v2σ2
2+a2
2v1σ2
1) + O(pnlog log n)a.s..
By the Skorokhod Embedding Theorem (cf., Hall and Heyde (1980)), there exists
an An-adapted non-decreasing sequence of random variables {Tn}and a standard
Brownian motion B, such that
Eh∆Tn|An−1] = Eh(∆Mn)2|An−1i,E|∆Tn|p
2≤CpE|∆Mn|p≤cp,∀p > 2,
Mn=B(Tn), n = 1,2,... .
(5.21)
Note that {Pn
m=1(∆Tm−E[∆Tm|Am−1])}is also a martingale. According to the
Law of the Iterated Logarithm, we have
Tn=
n
X
m=1
E[∆Tm|Am−1] + O(pnlog log n)
=ns
s+ 1(a2
1v2σ2
2+a2
2v1σ2
1) + O(pnlog log n)a.s..
DROP-THE-LOSER URN 407
On the other hand, by (5.19), we have N∗
n,1+N∗
n,2= [s/(s+1)]n+O(√nlog log n)
a.s.. So, un= max{m:N∗
m,1+N∗
m,2≤n}= [(s+ 1)/s]n+O(√nlog log n) a.s..
It follows that
Tun=n(a2
1v2σ2
2+a2
2v1σ2
1) + O(pnlog log n)a.s..
(5.22)
Substituting (5.22), (5.21) into (5.20) and applying the properties of Brownian
motion (cf., Theorem 1.2.1 of Cs¨org˝o and R´ev´esz (1981)), we have
a1q2Nn,2−a2q1Nn,1=a1q2N∗
un,2−a2q1N∗
un,1=B(Tun) + ou
γ+1
3γ+δ
n
=Bn(a2
1v2σ2
2+a2
2v1σ2
1)+O(nlog log n)1
4(log n)1
2+onγ+1
3γ+δ
=Bn(a2
1v2σ2
2+a2
2v1σ2
1)+onγ+1
3γ+δa.s..
Together with the fact that Nn,1+Nn,2=n, we have
Nn,1=a1q2
a1q2+a2q1
n−
Bn(a2
1v2σ2
2+a2
2v1σ2
1)
a1q2+a2q1
+onγ+1
3γ+δa.s..
Notice that σ2= (a2
2v1σ2
1+a2
1v2σ2
2)/(a1q2+a2q1)2, where σ2is defined in (3.1).
Let
W(t) = −1
σ
Bt(a2
1v2σ2
2+a2
2v1σ2
1)
a1q2+a2q1
.
Then {W(t); t≥0}is a standard Brownian motion. The proof of Theorem 3.1
is complete.
Acknowledgement
The authors are grateful to Professor William F. Rosenberger and two ref-
erees for providing several helpful comments which led to a significant improve-
ment of the article. This work was supported by a grant from the Research
Grants Council of the Hong Kong Special Administrative Region (Project no.
CUHK400204). L.X. Zhang’s work was partially supported by grants from the
National Science Foundation of China (Project no. 10471126).
References
Andersen, J., Faries, D. and Tamura, R. (1994). A randomized Play-the-Winner design for
multi-arm clinical trials. Comm. Statist. Theory Methods 23, 309-323.
Athreya, K. B. and Karlin, S. (1968). Embedding urn schemes into continuous time branching
processes and related limit theorems. Ann. Math. Statist. 39, 1801-1817.
408 LI-XIN ZHANG, WAI SUM CHAN, SIU HUNG CHEUNG AND FEIFANG HU
Athreya, K. B. and Ney, P. E. (1972). Branching Processes. Spring-Verlag, Berlin.
Bai, Z. D., Hu, F. and Rosenberger, W. F. (2002). Asymptotic properties of adaptive designs
for clinical trials with delayed response. Ann. Statist. 30, 122-139.
Beggs, A. W. (2005). On the convergence of reinforcement learning. J. Econom. Theory 122,
1-26.
Cs¨org˝o, M. and R´ev´esz, P. (1981). Strong Approximations in Probability and Statistics. Academic
Press, New York.
Durham, S. D., Flournoy, F. and Li, W. (1998). A sequential design for maximizing the proba-
bility for a favorable response. Canad. J. Statist. 26, 479-495.
Eisele, J. (1994). The doubly adaptive biased coin design for sequential clinical trials. J. Statist.
Plann. Inference 38, 249-262.
Eisele, J. and Woodroofe, M. (1995). Central limit theorems for doubly adaptive biased coin
designs. Ann. Statist. 23, 234-254.
Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and its Applications. Academic
Press, London.
Hopkins, E. and Posch, M. (2005). Attainability of boundary points under reinforcement learn-
ing. Games Econom. Behav. 53, 110-125.
Hu, F. and Rosenberger, W. F. (2003). Optimality, variability, power: evaluating response-
adaptive randomization procedures for treatment comparisons. J. Amer. Statist. Assoc.
98, 671-678.
Hu, F., Rosenberger, W. F. and Zhang, L.-X. (2006). Asymptotically best response-adaptive
randomization procedures. J. Statist. Plann. Inference 136, 1911-1922.
Hu, F. and Zhang, L.-X. (2004a). Asymptotic properties of doubly adaptive biased coin designs
for multi-treatment clinical trials. Ann. Statist. 32, 268-301.
Hu, F. and Zhang, L.-X. (2004b). Asymptotic normality of urn models for clinical trials with
delayed response. Bernoulli 10, 447-463.
Ivanova, A. (2003). A play-the-winner type urn model with reduced variability. Metrika 58,
1-13.
Ivanova, A. and Flournoy, N. (2001). A birth and death urn for ternary outcomes: stochastic
processes applied to urn models. In Probability and Statistical Models with Applications
(Edited by Ch. A. Charalambides, M. V. Koutras and N. Balakrishnan), 583-600. Chapman
& Hall/CRC.
Ivanova, A., Rosenberger, W. F., Durham, S. D. and Flournoy, N. (2000). A birth and death
urn for randomized clinical trials: asymptotic methods. Sankhy¯a Ser. B 62, 104-118.
Melfi, V. F. and Page, C. (1998). Variability in adaptive designs for estimation of success
probabilities. In New Developments and Applications in Experimental Design (Edited by
Flournoy, N., Rosenberger, W. F. and Wong, W. K.), 106-114, Hayward: Institute of
Mathematical Statistics.
Melfi, V. F. and Page, C. (2000). Estimation after adaptive allocation. J. Statist. Plann. Infer-
ence 87, 353-363.
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math.
Soc. 58, 527-535.
Rosenberger, W. F. (1996). New directions in adaptive designs. Statist. Sci. 11, 137-149.
Rosenberger, W. F. and Hu, F. (2004). Maximizing power and minimizing treatment failures.
Clinical Trials 1, 141-147.
DROP-THE-LOSER URN 409
Rosenberger, W. F. and Lachin, J. M. (2002). Randomization in Clinical Trials: Theory and
Practice. Wiley, New York.
Rosenberger, W. F., Stallard, N., Ivanova, A., Harper, C. N. and Ricks, M. L. (2001). Optimal
adaptive designs for binary response trials. Biometrics 57, 909-913.
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in
the revises of the evidence of the two samples. Biometrika 25, 275-294.
Wei, L. J. (1979). The generalized Polya’s urn for sequential medical trials. Ann. Statist. 7,
291-296.
Wei, L. J. (1988). Constructing exact two-sample permutational tests with the randomized
play-the-winner rule. Biometrika 75, 603-606.
Wei, L. J. and Durham, S. D. (1978). The randomized play-the-winner rule in medical trials.
Amer. Statist. Assoc. 73, 840-843.
Zhang, L. X., Hu, F. and Cheung, S. H. (2006). Asymptotic theorems of sequential estimation-
adjusted urn models. Ann. Appl. Probab. 16, 340-369.
Department of Mathematics, Zhejiang University, Hangzhou 310028, PR China.
E-mail: lxzhang@mail.hz.zj.cn
Department of Finance, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, PR
China.
E-mail: chanws@cuhk.edu.hk
Department of statistics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, PR
China.
E-mail: shcheung@sta.cuhk.edu.hk
Department of Statistics, University of Virginia, Halsey Hall, Charlottesville, Virginia 22904-
4135, USA.
E-mail: fh6e@virginia.edu
(Received August 2004; accepted August 2005)