ArticlePDF Available

Micro-Randomized Trials in mHealth

Authors:

Abstract and Figures

The use and development of mobile interventions is experiencing rapid growth. In "just-in-time" mobile interventions, treatments are provided via a mobile device that are intended to help an individual make healthy decisions" in the moment," and thus have a proximal, near future impact. Currently the development of mobile interventions is proceeding at a much faster pace than that of associated data science methods. A first step toward developing data-based methods is to provide an experimental design for use in testing the proximal effects of these just-in-time treatments. In this paper, we propose a "micro-randomized" trial design for this purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a treatment might be provided. Further, we develop a test statistic for assessing the proximal effect of a treatment as well as an associated sample size calculator. We conduct simulation evaluations of the sample size calculator in various settings. Rules of thumb that might be used in designing the micro-randomized trial are discussed. This work is motivated by our collaboration on the HeartSteps mobile application designed to increase physical activity.
Content may be subject to copyright.
Micro-Randomized Trials in mHealth
Peng Liao 1, Predrag Klasnja2, Ambuj Tewari1, and Susan A. Murphy1
1Department of Statistics, University of Michigan, Ann Arbor, MI 48109
2School of Information, University of Michigan, Ann Arbor, MI 48109
April 7, 2015
Abstract
The use and development of mobile interventions is experiencing rapid growth. In “just-in-time” mobile
interventions, treatments are provided via a mobile device that are intended to help an individual make healthy
decisions “in the moment,” and thus have a proximal, near future impact. Currently the development of mobile
interventions is proceeding at a much faster pace than that of associated data science methods. A first step
toward developing data-based methods is to provide an experimental design for use in testing the proximal
effects of these just-in-time treatments. In this paper, we propose a “micro-randomized” trial design for this
purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the
study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a
treatment might be provided. Further, we develop a test statistic for assessing the proximal effect of a treatment
as well as an associated sample size calculator. We conduct simulation evaluations of the sample size calculator
in various settings. Rules of thumb that might be used in designing the micro-randomized trial are discussed.
This work is motivated by our collaboration on the HeartSteps mobile application designed to increase physical
activity.
Key words: Mirco-randomized Trial, Sample Size Calculation, mHealth
1 Introduction
The use and development of mobile interventions is experiencing rapid growth. Mobile interventions are
used across the health fields and include treatments used to improve HIV medication adherence [
11
,
14
], to
improve activity [
12
], accompany counseling/pharmacotherapy in substance use [
4
,
18
], reinforce abstinence in
addictions [
1
,
2
] and to support recovery from alcohol dependence [
9
,
21
]. Mobile interventions in maintaining
adherence to anti-retroviral therapy and smoking cessation have shown sufficient effectiveness and replicability
in trials and thus have been recommended for inclusion in health services [8].
However as Nilsen et al. [
20
] state “In fact, the development of mHealth technologies is currently progressing
at a much faster pace than the science to evaluate their validity and efficacy, introducing the risk that ineffective
or even potentially harmful or iatrogenic applications will be implemented.”Indeed reviews, while reporting pre-
liminary evidence of effectiveness, call for more programmatic, data-based approaches to constructing mobile
interventions [
8
,
19
]. In particular these reviews call for research that focuses on data-informed development
of these complex multi-component interventions prior to their evaluation in standard randomized controlled
trials. But methods for using data to inform the design and evaluation of adaptive mobile interventions have
lagged behind the use and deployment of these interventions [13, 20, 26].
Many mobile interventions are designed to be “just-in-time” interventions, meaning that they intend to
provide treatments that help an individual make healthy decisions in the moment, such as engaging in a
desirable behavior (e.g., taking a medication on time) or effectively coping with a stressful situation. As such,
mobile interventions are often intended to have proximal, near-term effects. A first approach toward developing
data-based methods for evaluation of mobile health interventions is to provide an experimental design for use
in testing the proximal effects of the treatments. This paper proposes a micro-randomized trial design for this
purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the
study, with the result that each participant may be randomized at the hundreds or thousands of occasions at
which a treatment might be provided. This repeated randomization of treatments under investigation enables
causal modeling of each treatment’s time-varying proximal effect as well as modeling of time-varying effect
Corresponding author. 439 WestHall, 1085 South University Ave, Ann Arbor, MI 48109. Email:pengliao@umich.edu
1
arXiv:1504.00238v1 [stat.ME] 1 Apr 2015
moderation. Thus, the micro-randomized trial can be seen as a first experimental step in the development
of effective mobile interventions that are composed of sequences of treatments. We propose to size the trial
to detect the proximal main effect of the treatments. This is akin to the use of factorial designs for use in
constructing multi-component interventions. In these factorial designs [
3
,
6
], a first analysis often involves
testing if the main effect of each treatment is equal to 0.
This work is motivated by our collaboration on the HeartSteps mobile application for increasing physical
activity, which we will use to illustrate our discussion. One of the treatments in HeartSteps is suggestions for
physical activity which are tailored to the persons current context. HeartSteps can deliver these suggestions
at any of the five time intervals during the day, which correspond roughly to morning commute, mid-day,
mid-afternoon, evening commute, and post-dinner times. When a suggestion is delivered, the user’s phone
plays a notification sound, vibrates and lights up, and the suggestion is displayed on the lock screen of the phone.
These suggestions encourage activity in the current context and are intended to have an effect (getting a person
to walk) within the next hour.
In the following section, we introduce the micro-randomized trial design. In section 3 we precisely define
the proximal main effect of a treatment, using the language of potential outcomes. We develop the test statistic
for assessing the proximal effect of a treatment as well as an associated sample size calculator in section 4 and 5.
Next we provide simulation evaluation of the sample size calculator. We end, in Section 7, with a discussion.
2 Micro-Randomized Trial
In general an individual’s longitudinal data, recorded via mobile devices that sense and provide treatments, can
be written as
{S0,S1,A1,S2,A2,. . . ,St,At, . . ., ST,AT,ST+1}
where,
t
indexes decision times,
S0
is a vector of baseline information (gender, ethnicity, etc.) and
St
(
t
1) is
information collected between time
t
1 and
t
(e.g. summary measures of recent activity levels, engagement,
and burden; day of week; weather; busyness indicated by smartphone calendar, etc.). The treatment at time
t
is
denoted by
At
; throughout this paper we consider binary options for the treatments (e.g., the treatment is on
or off). The proximal response, denoted by
Yt+1
, is a known function of
{St,At,St+1}
. Here we assume that the
longitudinal data are independent and identically distributed across
N
individuals. Note that this assumption
would be violated, if for example, some of the treatments are used to enhance social support between individuals
in the study.
In HeartSteps, data (
St
) is collected both passively via sensors and via participant self-report. Each participant
is provided a “Jawbone” band [
5
], worn at the wrist, which collects daily step count and the amount of sleep the
user had the previous night. Furthermore sensors on the phone are used to collect a variety of information at
each of the 5 time points during the day, including the time-stamp, location, busyness of planned activities on
the phone calendar and other activity on the phone. Each evening, self-report data is collected including utility
and burden ratings. The proximal response,
Yt+1
, for activity suggestions is the step count in the hour following
time t.
A decision time is a point in time at which—based on participant’s current state, past behavior, or current
context—treatment may need to be delivered. Decision times vary by the nature of the intervention component.
In HeartSteps, the decision times for activity suggestions are 5 times per day over the 42 day study duration.
For an alcohol-recovery application that provides an intervention when an individual goes within 10 feet of a
high risk location (e.g. a liquor store), decision points might be every 2 minutes, the frequency at which the
application would get the persons current location and assess whether she is close to a high-risk location. In
a long-term study of an intervention for multiple health behaviors, the decision points might be weekly or
monthly at which times, decisions are made regarding whether to change the focus from one behavior (e.g.,
physical activity) to another (e.g., diet). Finally, in many studies there is an option for an individual to press a
"panic”button, indicating the need for help; for such interventions, decision times correspond to times at which
the panic button might be pressed.
A micro-randomized trial is a trial in which at each decision time
t
, participants are randomized to a
treatment option, denoted by
At
. Treatment options may correspond to whether or not a treatment is provided
at a decision time; for example in HeartSteps, whether or not the individual is provided a lock-screen activity
suggestion. Or treatment options may be alternative types of treatment that can be provided at the same decision
time; for example, a daily step goal treatment might have two options, a fixed 10,000-steps-a-day goal or an
adaptive goal based on the user’s activity level on the previous day. Considerations of treatment burden often
imply that the randomization will not be uniform. For example in HeartSteps,
P
[
At=
1]
=.
4 so that, if an
individual is always available, on average 2 lock-screen activity messages are delivered per day.
2
In designing, that is, determining the sample size for, a micro-randomized trial we focus on the reduced
longitudinal data
{S0,I1,A1,Y2,I2,A2,Y3, .. . , It,At,Yt+1, . .. , IT,AT,YT+1}.
The variable,
It
is an “availability”indicator. The availability indicator is coded as
It=
1 if the individual is
available for treatment and
It=
0 otherwise. At some decision times feasibility, ethics or burden considerations
mean that the individual is unavailable for treatment and thus
At
should not be delivered. Consider again
HeartSteps: if sensors indicate that the individual is likely driving a car or the individual is currently walking,
then the lock-screen activity message should not occur. Other examples of when individuals are unavailable for
treatment include: in the alcohol recovery setting, an “warning”treatment would only be potentially provided
when sensors indicate that the individual is within 10 feet of a high risk location or a treatment might only be
provided if the individual reports a high level of craving. If the application has a panic button, then only in an
x
second interval in which the panic button is pressed is it appropriate to provide “panic button”treatments.
Individuals may be unavailable for treatment by choice. For example, the HeartSteps application permits the
individual to turn off the lock-screen activity messages; this option is considered critical to maintaining partici-
pant buy-in and engagement with HeartSteps. After viewing the lock-screen activity message, the individual
has the option of turning off the lock-screen message for 4 or 8 or 12 hours. After the specified time interval,
the lock-screen message automatically turns on again. To summarize, the availability indicator at time
t
is the
indicator for the subpopulation at time
t
among which we are interested in assessing the proximal main effect of
the treatment; we are uninterested in assessing the proximal main effect of a treatment among individuals for
whom it is unethical to provide treatment or for whom it makes no scientific sense to provide treatment or among
those who refuse to be provided a treatment.
3 Proximal Main Effect of a Treatment
As discussed above, treatments in mobile health interventions are often designed so as to have a proximal
effect (e.g., increase activity in near future, help an individual manage current cravings for drugs or food, take
medications on schedule, etc.). As a result, a first question in developing a mobile health intervention is whether
the treatments have a proximal effect. Here we develop sample size formulae that guarantee a stated power to
detect the proximal effect of a treatment. In particular we aim to test if the proximal main effect is zero.
To define the proximal main effect of a treatment, we use potential outcomes [
22
,
23
,
25
]. Our use of
potential outcome notation is slightly more complicated than usual because treatment can only be provided
when an individual is available. As a result, we index the potential outcomes by decision rules that incorporate
availability. In particular define
d
(
a,i
) for
a{
0
,
1
}, i{
0
,
1
}
by
d
(
a,
0)
=
“unavailable-do nothing”and
d
(
a,
1)
=a
.
Then for each
a1A1={
0
,
1
}
, define
D1
(
a1
)
=d
(
a1,I1
). Then we denote the potential proximal responses
following decision time 1 by
{YD1(1)
2,YD1(0)
2}
and denote the potential availability indicators at decision time 2
by
{ID1(1)
2,ID1(0)
2}
. Next for each
¯
a2=
(
a1,a2
) with
a1,a2{
0
,
1
}
, define
D2
(
¯
a2
)
=d
(
a2,ID1(a1)
2
). Define
D2(¯
a2)=
(
D1
(
a1
)
,D2
(
¯
a2
)). A potential proximal response following decision time 2 and corresponding to
¯
a2
is
YD2(¯
a2)
3
and a potential availability indicator at decision time 3 is
ID2(¯
a2)
3
. Similarly, for each
¯
at=
(
a1, . .. , at
)
At=
{
(
a1, . .. , at
)
¯¯ai{
0
,
1
},i=
1
, . .. , t}
, define
Dt
(
¯
at
)
=d
(
¯
at,IDt1(¯
at1)
t
) and
Dt(¯
at)=
(
D1
(
a1
)
,. . . ,Dt
(
¯
at
)). For each
¯
at=
(
a1, . .. , at
)
At
, the potential proximal response is
YDt1(¯
at1)
t
(following decision time
t
1) and potential
availability indicator is IDt1(¯
at1)
tat decision time t.
We define the proximal main effect of a treatment at time tamong available individuals by:
β(t)=EµYDt(¯
At1,1)
t+1YDt(¯
At1,0)
t+1¯¯¯IDt1(¯
At1)
t=1
where the expectation is taken with respect to the distribution of the potential outcomes and randomization in
¯
At1
. This proximal effect is conditional in that the effect of treatment at time
t
is defined for only individuals
available for treatment at time
t
, that is,
IDt1(¯
At1)
t=
1. This proximal effect is a main effect in that the effect is
marginal over any effects of
¯
At1
. The former conditional aspect of the definition is related to the concept of
viable or feasible dynamic treatment regimes [
24
,
28
] in which one assesses only the causal effect of treatments
that can actually be provided.
Consider the proximal main effect,
β
(
t
), as
t
varies across time.
β
(
t
) may vary across time for a variety of
reasons. To see this consider the case of HeartSteps. Here
β
(
t
) might initially increase with increasing
t
as
participants learn and practice the activities suggested on the lock-screen. For larger
t
one might expect to see
3
decreasing or flat
β
(
t
) due to habituation (participants begin to, at least partially, ignore the messages). This
time variation in
β
(
t
) can be attributed to both the immediate effect of a lock-screen activity message as well as
interactions between the past lock-screen activity messages and the present activity message; the time variation
occurs at least partially due to the marginal character of
β
(
t
). Alternately the conditional definition of
β
(
t
)
means that the effect is only defined among the population of individuals who are available at decision time
t
.
Changes in this population may cause changes in
β
(
t
) across time. Again consider HeartSteps. At earlier time
points, participants are highly engaged, yet have not developed habits that in various ways increase their activity,
thus most participants will be available. However as time progresses, some participants may develop sufficiently
positive activity habits or anticipate activity suggestions, thus at later decision times these participants may
be already active and thus unavailable to receive a suggestion. Other participants may become increasing
disengaged and repeatedly turn off the lock-screen activity messages; these participants are also unavailable.
Thus as time progresses,
β
(
t
) may vary due to the subpopulation of participants among whom it is appropriate
to assess the effect of the lock-screen activity message.
Our main objective in determining the sample size will be to assure sufficient power to detect alternatives to
the null hypothesis of no proximal main effect, H
0
:
β
(
t
)
=
0
,t=
1
,. . . T
for a trial with
T
decision points (if
β
(
t
) is
nonzero then for the population available at decision time
t
, there is a proximal effect). The proposed test will
be focused on detecting smooth, i.e., continuous in t, alternatives to this null hypothesis.
To express
β
(
t
) in terms of the observed data distribution, we assume consistency [
22
,
23
]. This assumption
is that for each
t
, the observed
Yt
and observed
It
equal the corresponding potential outcomes,
YDt1(¯
at1)
t
,
IDt1(¯
at1)
t
whenever
¯
At1=¯
at1
. This assumption may be violated if some of the treatments promote social
linkages between participants, for example, to enhance social/emotional support or to compete in mobile
games. In these cases it would be more appropriate to additionally index each individual’s potential outcomes
by other participants’ treatments. The micro-randomization plus the consistency assumption implies that the
proximal main effect of treatment at time tamong available individuals, β(t) can be written as,
β(t)=E£YDt(¯
At1,1)
t+1¯¯IDt1(¯
At1)
t=1¤E£YDt(¯
At1,0)
t+1¯¯IDt1(¯
At1)
t=1¤
=E£YDt(¯
At1,1)
t+1¯¯IDt1(¯
At1)
t=1, At=1¤E£YDt(¯
At1,0)
t+1¯¯IDt1(¯
At1)
t=1, At=0¤
=E£YDt(¯
At)
t+1¯¯IDt1(¯
At1)
t=1, At=1¤E£YDt(¯
At)
t+1¯¯IDt1(¯
At1)
t=1, At=0¤
=E[Yt+1|It=1, At=1] E[Yt+1|It=1, At=0]
where the second equality follows from the randomization of the
At
s and the last equality follows from the
consistency assumption.
4 Test Statistic
Our sample size formula is based on a test statistic for use in testing H
0
:
β
(
t
)
=
0
,t=
1
,. . . T
against a scientifically
plausible alternative. This alternative should be formed based on conversations with domain experts. Here we
construct a test statistic to detect alternatives that are, at least approximately, linear in a vector parameter,
β
, that
is, alternatives of the form
Z0
tβ
, where the
p×
1 vector,
Zt
, is a function of
t
and covariates that are unaffected by
treatment such as time of day or day of week. In the case of HeartSteps, a plausible alternative is quadratic:
Z0
tβ=¡1,bt1
5c,(bt1
5c)2¢β(1)
where
β=
(
β1,β2,β3
)
0
(
p=
3). Recall that in HeartSteps there are 5 decision times per day;
bt1
5c
translates
decision times
t
to days. This rather simplistic parametrization marginalizes across the day and treats the
weekends and weekdays similarly.
We propose to use the alternate, H
1
:
β
(
t
)
=Z0
tβ
,
t=
1
,. . . ,T
to construct the test statistic. We base the test
statistic on the estimator of
β
in a least squares fit of a working model. A simple working model based on the
alternative is:
E[Yt+1|It=1, At]=B0
tα+(Atρt)Z0
tβ(2)
over all
t{
1
,. . . ,T}
, where
ρt
is the known randomization probability (
P
[
At=
1]
=ρt
) and the
q×
1 vector
Bt
is
a function of
t
and covariates that are unaffected by treatment such as time of day or day of week. Note that
At
is centered by subtracting off the randomization probability; thus the working model for
α
(
t
)
=E
[
Yt+1|It=
1] is
4
B0
tα. The estimators ˆ
α,ˆ
βminimize the least squares error:
PN(T
X
t=1
It¡Yt+1B0
tα(Atρt)Z0
tβ¢2)(3)
where PN©f(X)ªis defined as the average of f(X) over the sample.
Note that from a technical perspective, minimizing the least squares criterion, (3), is reminiscent of a
GEE analysis [
16
] with identity link function and a working correlation matrix equal to the identity. Thus it is
natural to consider a non-identity working correlation matrix as is common in GEE. This, however, is problem-
atic from a causal inference perspective. To see this suppose that the true conditional expectation is in fact
E(Yt+1|It=1, At]=B0
tα+
(
Atρt
)
Z0
tβ
, that is, the causal parameter,
β
(
t
) is equal to
Z0
tβ
. Further suppose
that the working correlation matrix has off-diagonal elements and that we estimate
β
by minimizing the
weighted (by the inverse of the working correlation matrix) least squares criterion. In this case the resulting
estimating equations include sums of terms such as
It¡Yt+1B0
tα(Atρt)Z0
tβ¢Is
(
Asρt
)
Zs
for
t>s
. Unfor-
tunately, both availability at time
t
,
It
, as well as
Yt+1
may be affected by treatment in the past (in particular,
As
),
thus absent strong assumptions
E£It¡Yt+1B0
tα(Atρt)Z0
tβ¢Is(Asρt)¤
is unlikely to be 0. Recall that a
minimal condition for consistency of estimators of (
α,β
) is that the estimating equations have expectation
0, thus absent further assumptions, the estimators derived from the weighted least squares criterion are likely
biased. Another possibility is to include a time-varying variance term in the least squares criterion, that is the
t
th entry in (3) might be weighted by a
σ2
t
. This would be useful in the data analysis, however for sample size
calculations, values of these variances are unlikely to be available. Thus for simplicity we use the unweighted
least squares criterion in (3).
Assume that the matrices
Q=PT
t=1E
[
It
]
ρt
(1
ρt
)
ZtZ0
t
and
PT
t=1E
[
It
]
BtB0
t
are invertible. The least squares
estimators, ˆ
α,ˆ
βare consistent estimators of
˜
α=ÃT
X
t=1
E[It]BtB0
t!1T
X
t=1
E[It]α(t)Bt(4)
and
˜
β=ÃT
X
t=1
E[It]ρt(1 ρt)ZtZ0
t!1T
X
t=1
E[It]ρt(1 ρt)β(t)Zt(5)
respectively. Furthermore if
β
(
t
) is in fact equal to
Z0
tβ
for some
β
, then
Z0
t˜
β=β
(
t
). This is the case even if
E
[
Yt+1|It=
1]
6= B0
t˜
α
. In the appendix (Lemma 1), we prove these results and also show that, under moment
conditions, pN(ˆ
β˜
β) is asymptotically normal with mean 0 and variance Σβ=Q1W Q 1where,
W=E"³T
X
t=1
˜
²tIt(Atρt)Zt´×³T
X
t=1
˜
²tIt(Atρt)Z0
t´#
and
˜
²t=Yt+1ItB0
t˜
α
(
Atρt
)
ItZ0
t˜
β
. To test the null hypothesis H
0
:
β
(
t
)
=
0
,t=
1
,. . . ,T
, one can use a test
statistic based on the alternative, e.g.
Nˆ
β0ˆ
Σ1
βˆ
β(6)
where
ˆ
Σβ=ˆ
Q1ˆ
Wˆ
Q1
and
ˆ
Q
and
ˆ
W
are plug in estimators. Note that this test statistic results from a GEE analysis
with identity link function and a working correlation matrix equal to the identity matrix for which sample size
formulae have been developed [
27
]. We build on this work as follows. As Tu et.al [
27
] discuss, under the null
hypothesis the large sample distribution of this statistic is a chi-squared with
p
degrees of freedom distribution.
If N, the sample size, is small, then, as recommended in [17], we make small adjustments to improve the small
sample approximation to the distribution of the test statistic. In particular Mancl and DeRouen recommend
adjusting
ˆ
W
using the “hat” matrix; see the formulae for the adjusted
ˆ
W
as well as
ˆ
Q
in Appendix A. Also in
small sample settings, investigators commonly suggest that instead of using a critical value based on the chi-
squared distribution, a critical value based on the
t
distribution should be used [
15
]. As we are considering a
simultaneous test for multiple parameters we form the critical value based on Hotelling’s
T
squared distribution
[
10
]. Hotelling’s
T
squared distribution is a multiple of the
F
distribution given by
d1(d1+d21)
d2Fd1,d2
; here we
use
d1=p
and
d2=Nqp
(recall
q
is the number of parameters in the nuisance parameter vector,
α
); see the
appendix for a rationale. In the following, the rejection region for the test of H
0
:
β
(
t
)
=
0
,t=
1
,. . . T
based on (6)
is
½Nˆ
β0ˆ
Σ1
βˆ
β>F1
p,Nqpµ(Nqp)(1 α0)
p(Nq1) ¶¾
where α0is the desired significance level.
5
5 Sample Size Formulae
As Tu et.al [
27
] have developed general sample size formulas in the GEE setting, here we focus on considerations
specific to the setting of micro-randomized trials. To size the study, we will determine the sample size needed to
detect the alternate, β(t) with:
H1:β(t)/ ¯
σ=d(t), t=1,. . . ,T
where
¯
σ2=
(1/
T
)
PT
t=1E£Var¡Yt+1¯¯It=1, At¢¤
is the average variance and
d
(
t
) is a standardized treatment effect.
When
N
is large and H
1
holds,
Nˆ
β0ˆ
Σ1
βˆ
β
is approximately distributed as a noncentral chi-squared
χ2
p
(
cN
), where
cN
, the non-centrality parameter, satisfies
cN=N
(
¯
σ˜
d
)
0Σ1
β
(
¯
σ˜
d
), and
˜
d=¡PT
t=1E[It]ρt(1 ρt)ZtZ0
t¢1PT
t=1E
[
It
]
ρt
(1
ρt)d(t)Zt[27]. Note that ˜
d=˜
β/¯
σ.
Working Assumptions
. To derive the sample size formula, we use the form of the non-centrality parameter
of the limiting non-central chi-squared distribution, along with working assumptions. The working assumptions
are used to simplify the form of Σ1
β. In particular, we make the following working assumptions:
(a) E(Yt+1|It=1) =B0
tα, for some αRq
(b) β(t)=Z0
tβfor some βRp
(c) Var(Yt+1|It=1, At) is constant in tand At
(d) E[˜
²t˜
²s|It=1, Is=1, At,As] is constant in At,As.
where, as before,
˜
²t=Yt+1ItB0
t˜
α
(
Atρt
)
ItZ0
t˜
β
. See the proof in appendix A (Lemma 2). The above working
assumptions are somewhat simplistic but as will be seen below the resulting sample size formula is robust to
moderate violations. First, under these working assumptions the alternative hypothesis can be re-written as
H1:β/¯
σ=d, (7)
where dis a pdimensional vector of standardized effects. Furthermore, Σβis given by
Σβ=¯
σ2³T
X
t=1
E[It]ρt(1 ρt)ZtZ0
t´1
,
and thus cNis given by
cN=Nd 0³T
X
t=1
E[It]ρt(1 ρt)ZtZ0
t´d. (8)
To improve the small sample approximation, we use the multiple of the
F
-distribution as discussed above. Thus
the sample size, N, is found by solving
p(Nq1)
NqpFp,Nqp;cNµF1
p,Nqpµ(Nqp)(1 α0)
p(Nq1) ¶¶=1β0(9)
where
Fp,Nqp;cN
is the noncentral
F
distribution with noncentrality parameter,
cN
and 1
β0
is the desired
power. The inputs to this sample size formula are
{Zt}T
t=1
, a scientifically meaningful value for
d
(see below for
an illustration), the time-varying availability pattern,
{E
[
It
]
}T
t=1
, the desired significance level,
α0
and power,
1β0.
Now we describe how the information needed in the sample size formula might be obtained when the
alternative is quadratic (
p=
3, (1)). In this case we first elicit the initial standardized proximal main effect given by
Z0
1β
/
¯
σ=β1
/
¯
σ
. Second we elicit the averaged across time, standardized proximal main effect
¯
d=1
TPT
t=1Z0
tβ
/
¯
σ
.
Lastly we elicit the time at which the proximal main effect is maximal, i.e. argmaxtZ0
tβ. These three quantities
can then be used to solve for
d=
(
d1,d2,d3
)
0
. For example, in HeartSteps, we might want to determine the
sample size to ensure 80% power when there is no initial treatment effect on the first day, and the maximum
proximal main effect comes around day 29. We specify the expected availability,
E
[
It
] to be constant in
t
and
Zt
is given by (1). Table I gives sample sizes for HeartSteps under a variety of average standardized proximal main
effects ( ¯
d).
6
Table I: Illustrative sample sizes for Heart-
Steps. The day of maximal treatment effect
is 29. The expected availability is constant
in t.
¯
d
E[It]0.7 0.6 0.5 0.4
0.10 32 36 42 52
0.09 38 44 51 63
0.08 47 54 64 78
0.07 60 69 81 101
0.06 79 92 109 135
0.05 112 130 155 193
¯
d=
(1/
T
)
PT
t=1Z0
td
is the average stan-
dardized treatment effect.
In the behavioral sciences a standardized effect size of 0
.
2 is considered small [
7
]. Thus given the very small
standardized effect sizes, the sample sizes given in Table I seem unbelievably small. Two points are worth
making in this regard. First the use of the alternative parametric hypothesis (7) in forming the test statistic,
implies that both between-subject as well as within-subject contrasts in proximal responses are used to detect
the alternative. To see this, note that if we focused on only the first time point,
t=
1, and tested
H0
:
β
(1)
=
0, then
an appropriate test would be a two-sample
t
-test based on the proximal response
Y2
, in which case the required
sample size would be much larger (akin to the sample size for a two arm randomized-controlled trial in which
40% of the subjects are randomized to the treatment arm). This two-sample
t
-test uses only between-subject
contrasts in proximal response to test the hypothesis. The required sample size would be even larger for a test of
H0
:
β
(1)
=
0
,β
(2)
=
0 in which no relationship between
β
(1) and
β
(2) is assumed. Conversely the sample size
would be smaller if one focused on detecting alternatives to
H0
:
β
(1)
=
0
,β
(2)
=
0 of the form
H1
:
β
(1)
=β
(2)
6=
0.
The use of the alternative,
β
(1)
=β
(2)
6=
0, allows one to construct tests that use both between-subject as well
as within-subject contrasts in proximal responses. Our approach is in between these two extremes in that we
focus on detecting smooth, in
t
, alternatives to
H0
:
β
(
t
)
=
0 for all
t
. This permits use of both within- as well as
between-subject contrasts in proximal responses. The assumption of a parsimonious alternative enables the use
of smaller sample sizes. A second point is that, at this time, there is no general understanding of how large the
standardized effect size should be for these "in-the-moment" effects of a treatment. Thus these standardized
effects may or may not be considered small in future.
6 Simulations
We consider a variety of simulations with different generative models to evaluate the performance of the sample
size formulae. In the simulations presented here, we use the same setup as in HeartSteps; see Appendix B for
simulations in other setups (Table 4B). Specifically, the duration of the study is 42 days and there are 5 decision
times within each day (
T=
210). The randomization probability is 0.4 , e.g.
ρ=ρt=P
(
At=
1)
=
0
.
4. The sample
size formula is given in (8) and (9). All simulations are based on 1,000 simulated data sets.
Throughout this section the inputs to this sample size formula are
Zt=¡1,bt1
5c,bt1
5c2¢0
, the time-varying
availability pattern,
τt=E
[
It
],
d
,
α0=.
05 and power, 1
β0=.
80. The value for the vector
d
is indirectly specified
via (a) the time at which the maximal standardized proximal main effect is achieved (
argmaxtZ0
td
), (b) the
averaged across time, standardized proximal main effect
¯
d=1
TPT
t=1Z0
td
and (c) no initial standardized proximal
main effect (
Z0
1d=d1=
0). The test statistic used to evaluate the sample size formula is given by (6) in which
Bt
and Ztare set to ¡1,bt1
5c,bt1
5c2¢0.
The simulation results provided below illustrate that the sample size formula and associated test statistic are
robust. For convenience we summarize the results here. When the working assumptions hold, then under a
variety of availability patterns, i.e., time-varying values for
τt=E
[
It
] (see Figure 1) the desired Type 1 error and
power are preserved. This is also the case when past treatment impacts availability. Furthermore the sample
size formula is robust to deviations from the working assumptions, that is, provides the desired Type 1 error
and power; this is true for a variety of forms of the true proximal main effect of the treatment (see Figure 2), a
variety of distributions and correlation patterns for the errors, and dependence of
Yt+1
on past treatment. In all
cases the above robustness occurs as long as we provide an approximately true or conservative value for the
standardized effect,
d
and if we provide an approximately true or conservative (low) value for the availability,
E[It].
7
In our simulations, we note several areas in which the sample size formula is less robust to the working
assumption (c); this is when the error variance in
Yt+1
varies depending on whether treatment
At=
1 or
At=
0
or with time
t
. In particular if the ratio of
Var
[
Yt+1|It=
1
,At=
1]/
Var
[
Yt+1|It=
1
,At=
0]
<
1, then the power is
reduced. Also if average variance,
E£Var
[
Yt+1|It=
1
,At
]
¤
varies greatly with time
t
, then the power is reduced.
See below for details. Lastly as would be expected for any sample size formula, using values of the standardized
effect size, d, or availability that are larger than the truth degrades the power of the procedure.
6.1 Working Assumptions Underlying Sample Size Formula are True
First, we considered a variety of settings in which the working assumptions (a)-(d) hold and in which the inputs to
the sample size formula are correct (
d
is correct under the alternate hypothesis and the time-varying availability
E
[
It
] is correct). Neither the working assumptions nor the inputs to the sample size formula specify the error
distribution, thus in the simulation we consider 5 distributions for the errors in the model for
Yt+1
including
independent normal, student’s
t
and exponential distributions as well as two autoregressive (AR) processes;
all of these error patterns satisfy
¯
σ2=
1 (recall
¯
σ2=
(1/
T
)
PT
t=1E£Var¡Yt+1¯¯It=1, At¢¤
). Furthermore neither
the working assumptions nor the inputs to the sample size formula specify the dependence of the availability
indicator,
It
on past treatment. Thus we consider settings in which the availability decreases as the number of
recent treatments increases. For brevity, we provide these standard results in the Appendix B (Tables 2B and 3B).
The results are generally quite good, with very few Type 1 error rates significantly above .05 and power levels
significantly below .80.
Pattern 1 Pattern 2 Pattern 3 Pattern 4
0.40
0.45
0.50
0.55
0.60
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time
Availability
Figure 1: Availability Patterns. The x-axis is decision time point and y-axis is the expected availability. Pattern 2
represents availability varying by day of the week with higher availability on the weekends and lower mid-week.
The average availability is 0.5 in all cases.
6.2 Working Assumptions Underlying Sample Size Formula are False
Second, we considered a variety of settings in which the working assumptions are false but the inputs to the
sample size formula are approximately correct as follows. Throughout ¯
σ2=1.
6.2.1 Working Assumption (a) is Violated.
Suppose that the true
E
[
Yt+1|It=
1]
6=Btα
for any
αRq
. In particular, we consider the scenario in which there
is a "weekend" effect on Yt+1; see other scenario in Appendix B. The data is generated as follows,
ItBer
¡τt¢,AtBer
¡ρ¢
Yt+1=α(t)+(Atρ)Z0
td+²t, if It=1
where the conditional mean
α
(
t
)
=B0
tα+Wtθ
.
Wt
is a binary variable:
Wt=
1 if day of the week is time
t
is a
weekend day, and
Wt=
0 if the day is a weekday. For simplicity, we assume each subject starts on Monday, e.g.
for
k=
1
,. . . ,
6,
Wi+35(k1) =
0, when
i=
1
,. . . ,
25,
Wi+35(k1) =
1, when
i=
26
,. . . ,
35 (recall that we assume in the
simulation that there are 5 decision time points per day and the length of the study is 6 week). The values of
{αi,i=
1
,
2
,
3
}
are determined by setting
α
(1)
=
2
.
5
,arg maxtα
(
t
)
=T,
(1/
T
)
PT
t=1α
(
t
)
α
(1)
=
0
.
1. The error terms
{²t}N
t=1
are i.i.d N(0
,
1). The day of maximal proximal effect is 29. Additionally, different values of the averaged
standardized treatment effect and four patterns of availability as shown in Figure 1 with average 0.5 and are
considered. The type I error rate is not affected, thus is omitted here. The simulated power is reported in Table
II; for more details see Table 6B in Appendix B.
8
Table II: Simulated power when working assump-
tion (a) is violated. The patterns of availability are
provided in Figure 1.
Availability Pattern
θ¯
dPattern 1 Pattern 2 Pattern 3
0.5 ¯
d0.10 0.80 0.79 0.81
0.06 0.78 0.83 0.81
1¯
d0.10 0.79 0.78 0.78
0.06 0.78 0.79 0.79
1.5 ¯
d0.10 0.78 0.81 0.78
0.06 0.77 0.81 0.82
2¯
d0.10 0.78 0.79 0.79
0.06 0.81 0.79 0.78
θ
is the coefficient of
Wt
in
E
[
Yt+1|It=
1].
¯
d=
(1/
T
)
PT
t=1Z0
td
is the average standardized treat-
ment effect. Bold Numbers are significantly (at .05
level) greater than .05.
6.2.2 Working Assumption (b) is Violated.
Suppose that the true
β
(
t
)
6= Z0
tβ
for any
β
. Instead the vector of standardized effect,
d
, used in the sample
size formula corresponds to the projection of
d
(
t
), that is,
d=¡PT
t=1E[It]ZtZ0
t¢1PT
t=1E
[
It
]
Ztd
(
t
) (recall
d
(
t
)
=
β
(
t
)/
¯
σ
and
ρt=ρ
). The sample size formula is used with the correct availability pattern,
{E
[
It
]
}T
t=1
. The data for
each simulated subject is generated sequentially as follows. For each time t,
ItBer
¡τt¢,AtBer
¡ρ¢
Yt+1=α(t)+(Atρ)d(t)+²t, if It=1
for the variety of
d
(
t
)
=β
(
t
)/
¯
σ
and
E
[
It
] patterns provided in Figure 2 and in Figure 1 respectively. The average
availability is 0.5. The error terms
{²t}T
t=1
are generated as i.i.d.
N
(0
,
1). The conditional mean,
E
[
Yt+1|It=
1]
=α
(
t
) is given by
α
(
t
)
=α1+α2bt1
5c+α3bt1
5c2
, where
α1=
2
.
5,
α2=
0
.
727,
α3= −
8
.
66
×
10
4
(so that
(1/T)Ptα(t)α(1) =1, argmaxtα(t)=T).
Table III: Simulated Power when working assumption (b) is violated. The shape
of the standardized proximal effect and pattern for availability are provided in
Figure 2 and 1 respectively. The sample sizes are given on the right.
Shape of d(t)
¯
dAvailability Pattern Max Maintained Degraded Sample Size
0.10
Pattern 1 15 0.78 0.79 43 39
29 0.80 0.79 38 38
Pattern 2 15 0.79 0.80 43 39
29 0.78 0.79 38 38
Pattern 3 15 0.81 0.77 45 41
29 0.81 0.78 37 39
0.06
Pattern 1 15 0.81 0.79 111 100
29 0.81 0.79 96 96
Pattern 2 15 0.79 0.81 112 100
29 0.79 0.80 96 96
Pattern 3 15 0.78 0.81 116 106
29 0.80 0.80 95 101
¯
d=
(1/
T
)
PT
t=1Z0
td
is the average standardized treatment effect. The "Max" in
the first row refers to the day of maximal proximal effect. Bold Numbers are
significantly (at .05 level) lower than .80.
9
0.00
0.05
0.10
0.15
0.20
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time
Proximal Effect
Figure 2: Proximal Main Effects of Treatment,
{d
(
t
)
}T
t=1
: representing maintained and severely degraded time-
varying proximal treatment effects. The horizontal axis is the decision time point. The vertical axis is the
standardized treatment effect. The "Max" in the titles refer to the day of maximal proximal effect. The average
standardized proximal effect is ¯
d=0.1 in all plots.
The simulated powers are provided in Table III. In all cases the power is close to
.
80; this is because all of
the proximal main effect patterns in Figure 2 are sufficiently well approximated by a quadratic in time. See
Appendix B for other cases of d(t) and details (Figure 5 and Table 9B).
6.2.3 Working Assumption (c) is Violated.
Suppose that
Var
[
Yt+1|It=
1
,At
]
=Atσ2
1t+
(1
At
)
σ2
0t
where
σ1t
/
σ0t6=
1. The sample size formula is used with
the correct pattern for
{Z0
td,E
[
It
]
}T
t=1
. The data for each simulated subject is generated sequentially as follows.
For each time t,
ItBer
¡τt¢,AtBer
¡ρ¢
Yt+1=α(t)+(Atρ)Z0
td+1{At=1}σ1t²t+1{At=0} σ0t²t, if It=1
where the average across time standardized proximal main effect,
¯
d=1
TPT
t=1Z0
td
is 0
.
1 and day of maximal
effect is equal to 22 or 29. The function
α
(
t
)
=E
[
Yt+1|It=
1] is as in the prior simulation. The availability,
τt=
0
.
5.
The error terms
{²t}
follow a normal AR(1) process, e.g.
²t=φ²t1+vt
with the variance of
vt
scaled so that
Var
[
²t
]
=
1. Define
¯
σ2
t=E£Var
[
Yt+1|It=
1
,At
]
¤¡=ρσ2
1t+(1 ρ)σ2
0t¢
. Recall the average variance
¯
σ2
is given by
(1/
T
)
PT
t=1¯
σ2
t
. We consider 3 time-varying trends for
{¯
σt}
together with different values of
σ1t
/
σ0t
; see Figure
(3). In each trend,
¯
σ2
t
is scaled such that
¯
σ=
1; thus the standardized proximal main effect in the generative
model is
Z0
td
. In all cases, the simulated type I error rates are close to
.
05 and thus the table is omitted here (see
Appendix B, Table 10B). The simulated power is given in Table IV.
Table IV: Simulated Power when working assumption (c) is violated,
σ1t6=
σ0t
. The trends are provided in Figure 3. The availability is 0.5. The average
proximal main effect,
¯
d=
0
.
1 and the day of maximal effect is 22 or 29, and
thus the associated sample sizes are 41 and 42.
Max = 22 (N = 41) Max = 29 (N = 42)
φσ1t
σ0ttrend 1 trend 2 trend 3 trend 1 trend 2 trend 3
0.8 0.83 0.84 0.80 0.81 0.89 0.79
-0.6 1.0 0.79 0.80 0.75 0.74 0.85 0.70
1.2 0.76 0.76 0.71 0.72 0.81 0.70
0.8 0.85 0.82 0.79 0.81 0.88 0.78
0 1.0 0.79 0.81 0.74 0.77 0.86 0.72
1.2 0.77 0.77 0.71 0.70 0.83 0.70
0.8 0.83 0.83 0.81 0.77 0.87 0.77
0.6 1.0 0.76 0.79 0.75 0.73 0.85 0.77
1.2 0.78 0.77 0.73 0.72 0.82 0.69
φ
is the parameter in AR(1) for
{²t}T
t=1
. “Max”is the day in which the maxi-
mal proximal effect is attained. Bold numbers are significantly (at .05 level)
lower than .80.
10
Trend 1 Trend 2 Trend 3
0.8
0.9
1.0
1.1
1.2
0.8
0.9
1.0
1.1
1.2
0.8
1.0
1.2
1.4
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time
Sigma
Figure 3: Trend of
¯
σt
: For all trends,
¯
σ2
t
is scaled so that (1/
T
)
PT
t=1¯
σ2
t=
1. In Trend 3, the variance,
¯
σ2
t=
E£V ar [Yt+1|It=1, At]¤peaks on weekends. In particular, ¯
σ7k+i=0.8 for i=1,. . ., 5 and ¯
σ7k+i=1.5 for i=6,7.
In the case of
σ1t<σ0t
, the simulated powers are slightly larger than 0.8, while the simulated powers are
smaller than 0.8 in the case of
σ1t>σ0t
. The impact of
¯
σt
on the power depends on the shape of treatment
effect: when
β
(
t
) attains its maximum, more than halfway through the study, at day 29, a increasing
{¯
σt}
, trend
1, lowers the power, while a decreasing
{¯
σt}
, trend 2, improves the power. When
β
(
t
) attains a maximal effect
midway through the study, either decreasing or increasing
{¯
σt}
does not impact power. A large variation in
¯
σt
,
e.g. trend 3, reduces the power in all cases. The differing auto correlations of the errors,
²t
, do not affect power;
see a more detailed table in Appendix B, Table 10B.
6.2.4 Working Assumption (d) is Violated
We violate assumption (d) by making both the availability indicator,
It
and proximal response,
Yt+1
depend
on past treatment and past proximal responses. The sample size formula is used with the correct value of
{Z0
td,E
[
It
]
}T
t=1
; in particular
d
is determined by an average proximal main effect of
¯
d=
0
.
1, day of maximal effect
equal to 29 (
d1=
0
,d2=
9
.
64
×
10
3,d3=
1
.
72
×
10
4
) and with a constant availability pattern equal to 0.5. The
data for each simulated subject is generated as follows. Denote the cumulative treatment over last 24 hours by
Ct=P5
j=1AtjItj. In each time t,
ItBer
¡τt+τtη1(CtE[Ct])+τtη2Trunc( 1
5
5
X
j=1
²tj)¢,AtBer
¡ρ¢
Yt+1=(α(t)+γ1[CtE[Ct|It=1]]+(Atρ)£Z0
td+Z0
tdγ2(CtE[Ct|It=1])¤+σ²tif It= 1
α0(t)+²tif It= 0.
where
{²t}T
t=1
are i.i.d
N
(0
,
1) and
Trunc
(
x
) :
=x1|x|≤1+sign
(
x
)
I|x|>1
(the truncation is used to ensure that
τt+
τtη1
(
CtE
[
Ct
])
+τtη2Trunc
(
1
5P5
j=1²tj
)
[0
,
1]). Again
α
(
t
) is as in the prior simulation.
σ
is calculated such
that the average variance is equal to 1, e.g.
¯
σ=1
TPT
t=1E
[
Var
[
Yt+1|It=
1
,At
]]
=
1. Note that since
Ct
is centered
in both the model for
It
as well as in the model for
Yt+1
, the standardized proximal main effect is
Z0
td
and
E
[
It
]
=τt=
0
.
5.
α0
(
t
) is the conditional mean of
Yt+1
when
It=
0. The form of
E
[
Yt+1|It=
0] is not essential:
only
Ys+1E
[
Ys+1|Is=
0] is used to generate
It
. In the simulation,
E
[
Ct|It=
1] and
σ
are calculated by Monte
Carlo methods. As before, the simulated type I error are not affected; see Table 11B in appendix B. The simulated
powers are provided in Table V.
Table V: Simulated Power when working assumption
(d) is false. The expected availability is 0.5, the average
proximal main effect
¯
d=
0
.
1 and the maximal effect is
attained at day 29. The associated sample size is 42.
Parameters in Itγ1
γ2-0.1 -0.2 -0.3
-0.2 0.80 0.81 0.79
η1=0.1,η2= 0.1 -0.5 0.79 0.81 0.80
-0.8 0.81 0.82 0.79
-0.2 0.78 0.82 0.79
η1=0.2,η2= 0.1 -0.5 0.81 0.77 0.77
-0.8 0.81 0.79 0.78
-0.2 0.78 0.78 0.80
η1=0.1,η2= 0.2 -0.5 0.80 0.79 0.78
-0.8 0.78 0.79 0.80
γ1
,
γ2
are parameters for the cumulative treatments in
model of
Yt+1
;
η1
,
η2
are parameters in model of
It
. Bold
numbers are significantly(at .05 level)less than .80.
11
6.3 Some Practical Guidelines
Third, it is critical to use conservative values of
d
and availability
E
[
It
] in the sample size formula. It is not
surprising that the quality of the sample size formula depends on an accurate or conservative values of the
standardized effects,
d
, as this is the case for all sample size formulas. Additionally availability provides the
number of decision points as which treatment might be provided per individual and thus the sample size
formula should be sensitive to availability. To illustrate these points we consider a simulation in which the data
is generated by
ItBer
¡τt¢,AtBer
¡ρ¢
Yt+1=α(t)+(Atρ)Z0
td+²t, if It=1
where the
²t
s are i.i.d. standard normals and
α
(
t
) is as in the prior simulations. First suppose the scientist
provides the correct availability pattern,
{E
[
It
]
}T
t=1
, the correct time at which the maximal standardized proximal
main effect is achieved (
argmaxtZ0
td
) and the correct initial standardized proximal main effect (
Z0
1d=d1=
0)
but provides too low a value of the averaged across time, standardized proximal main effect
¯
d=1
TPT
t=1Z0
td
. The
simulated power is provided in Appendix B, Table 12B. The degradation in power is pronounced as might be
expected.
Second, suppose the scientist provides the correct
argmaxtZ0
td
, correct
Z0
1d=d1=
0, correct
¯
d=1
TPT
t=1Z0
td
and although the scientist’s time-varying pattern of availability is correct, the magnitude is underestimated. The
simulation result is in Appendix B, Table 13B. Again the degradation in power is pronounced.
7 Discussion
In this paper, we have introduced the use of micro-randomized trials in mobile health and have provided an
approach to determining the sample size. More sophisticated sample size procedures might be entertained.
Certainly it makes sense to include baseline information in the sample size procedure, for example in HeartSteps,
a natural baseline variable is baseline step count. The inclusion of baseline variables in
Bt
in the regression
(2)
is
straightforward. An interesting generalization to the sample size procedure would allow scientists to include
time-varying variables (in
St
) as covariates in
Bt
in the regression
(2)
. This might be a useful strategy for reducing
the error variance.
Although this paper has focused on determining the sample size to detect the proximal main effect of a
treatment with a given power, micro-randomized studies provide data for a variety of interesting further analyses.
For example, it is of some interest to model and understand the predictors of the time-varying availability
indicator. In the case of HeartSteps we will know why the participant is unavailable (driving a car, already active
or has turned off the lock-screen messages) so we will be able to consider each type of availability indicator.
Other very interesting further analyses include assessing interactions between treatments,
At
and context,
St
,
past treatment
As,s<t
on the proximal response,
Yt+1
. Also there is much interest in using this type of data to
construct “dynamic treatment regimes”; in this setting these are called Just-in-Time Adaptive Interventions [
26
].
The sequential micro-randomizations enhance all of these analyses by reducing causal confounding.
12
Appendix A Theoretical Results and Proofs
Lemma 1
(Least Squares Estimator)
.
The least square estimators
ˆ
α,ˆ
β
are consistent estimators of
˜
α,˜
β
in
(4)
and
(5)
. In particular, if
β
(
t
)
=Z0
tβ
for some vector
β
, then
˜
β=β
. Under moment conditions, we have
pN
(
ˆ
β˜
β
)
N
(0
,Σβ
), where the asymptotic variance
Σβ
is given by
Σβ=Q1W Q1
where
Q=PT
t=1E
[
It
]
ρt
(1
ρt
)
ZtZ0
t
,
W=EhPT
t=1˜
²tIt(Atρt)Zt×PT
t=1˜
²tIt(Atρt)Z0
tiand ˜
²t=Yt+1B0
t˜
αZ0
t˜
β(Atρt).
Proof. It’s easy to see that the least square estimators satisfy
ˆ
θ=(ˆ
α,ˆ
β)=³PN
T
X
t=1
ItXtX0
t´1³PN
T
X
t=1
ItYt+1Xt´
³T
X
t=1
E(ItXtX0
t)´1³T
X
t=1
E(ItYt+1Xt)´
where X0
t=(B0
t,(Atρt)Z0
t)R1×(p+q)is the covariate at time t. For each t,
E(ItXtX0
t)=µE[It]BtB0
tBtZ0
tE[It(Atρt)]
ZtB0
tE[It(Atρt)] ZtZ0
tE[It(Atρt)2]=µE[It]BtB0
t0
0E[It]ρt(1 ρt)ZtZ0
t
E(ItYt+1Xt)=µE[ItYt+1]Bt
E[ItYt+1(Atρt)]Zt=µE[ItYt+1]Bt
ρt(1 ρt)E[It]β(t)Zt,
so that
ˆ
αÃT
X
t=1
E[It]BtB0
t!1T
X
t=1
E[ItYt+1]Bt=ÃT
X
t=1
E[It]BtB0
t!1T
X
t=1
E[It]α(t)Bt
ˆ
βÃT
X
t=1
ρt(1 ρt)E[It]ZtZ0
t!1T
X
t=1
E[ItYt+1(Atρt)]Zt=ÃT
X
t=1
ρt(1 ρt)E[It]ZtZ0
t!1T
X
t=1
E[It]ρt(1 ρt)β(t)Zt
as in
(4)
and
(5)
. We can see that if
β
(
t
)
=Z0
tβ
, then
¡PT
t=1ρt(1 ρt)E[It]ZtZ0
t¢1PT
t=1E
[
It
]
ρt
(1
ρt
)
β
(
t
)
Zt=
¡PT
t=1ρt(1 ρt)E[It]ZtZ0
t¢1PT
t=1E[It]ρt(1 ρt)ZtZ0
tβ=β. This is true even if E[Yt+1|It=1] 6= B0
t˜
α.
We can easily see that,
pN(ˆ
θ˜
θ)=pN½¡PN
T
X
t=1
ItXtX0
t¢1h¡PN
T
X
t=1
ItYt+1Xt¢¡PN
T
X
t=1
ItXtX0
t¢˜
θi¾
=pNnE£
T
X
t=1
ItXtX0
t¤1¡PN
T
X
t=1
It˜
²tXt¢o+op(1), (10)
where
op
(
1
) is a term that converges in probability to zero as
N
goes to infinity. By the definition of
˜
α
and
β
, we
have
E£
T
X
t=1
It˜
²tXt¤=µPT
t=1E[It]¡α(t)B0
t˜
α¢Bt
PT
t=1E[It]ρt(1 ρt)¡β(t)Z0
t˜
β¢Zt=0
So that under moments conditions, we have pN(ˆ
θ˜
θ)N(0,Σθ), where Σθis given by
Σθ=E£
T
X
t=1
ItXtX0
t¤1E£
T
X
t=1
It˜
²tXt×
T
X
t=1
It˜
²tX0
t¤E£
T
X
t=1
ItXtX0
t¤1=·ΣαΣαβ
Σ0
αβ Σβ¸.
In particular, ˆ
βsatisfies pN(ˆ
β˜
β)N(0,Σβ) and Σβis given by
Σβ=³T
X
t=1
E[It]ρt(1 ρt)ZtZ0
t´1
EhT
X
t=1
˜
²tIt(Atρt)Zt×
T
X
t=1
˜
²tIt(Atρt)Z0
tT
X
t=1
E[It]ρt(1 ρt)ZtZ0
t´1
=Q1W Q1.
Lemma 2
(Asymptotic Variance Under Working Assumptions)
.
Assuming working assumptions (a)-(d) are true,
then under the alternative hypothesis H1in (7), Σβand cNare given by
Σβ=¯
σ2³T
X
t=1
E[It]ρt(1 ρt)ZtZ0
t´1
,
cN=Nd 0³T
X
t=1
E[It]ρt(1 ρt)ZtZ0
t´d.
13
Proof.
Note that under assumptions (b) and (c), we have
Z0
t˜
β=β
(
t
) and
Var
(
Yt+1|It=
1
,At
)
=¯
σ
for each t, and
˜
d=d
. The middle term,
W
, in
Σβ
can be separated by two terms, e.g.
EhPT
t=1˜
²tIt
(
Atρt
)
Zt×PT
t=1˜
²tIt
(
At
ρt
)
Z0
ti=PT
t=1E£˜
²2
tIt
(
Atρt
)
2¤ZtZ0
t+PT
i6=jE£˜
²i˜
²jIiIj
(
Aiρi
)(
Ajρj
)
¤ZiZ0
j
. Under assumptions (a), (b) and
(c), we have
E
[
˜
²t|It=
1
,At
]
=
0 and
E£˜
²2
tIt
(
Atρt
)
2¤=E
[
It
]
ρt
(1
ρt
)
¯
σ2
. Furthermore, suppose
i>j
, then
E£˜
²i˜
²jIiIj
(
Aiρ
)(
Ajρ
)
¤=E
[
IiIj
(
Ajρ
)(
Aiρ
)]
×E
[
˜
²t˜
²s|It=
1
,Is=
1
,At,As
]
=
0, because
Ai
|=
{Ii,Ij,Aj}
and
the first term is 0. Wis then given by
W=¯
σ2T
X
t=1
E[It]ρt(1 ρt)ZtZ0
t,
so that Σβ=¯
σ2¡PT
t=1E[It]ρt(1 ρt)ZtZ0
t¢1and cN=N(¯
σ˜
d)0Σ1
β(¯
σ˜
d)=Nd 0³PT
t=1E[It]ρt(1 ρt)ZtZ0
t´d.
Remark: Working assumption (d) can be replaced by assuming
E
[
Yt+1|It=
1
,At,Is=
1
,As
]
E
[
Yt+1|It=
1
,At
]
does not depend on
At
for any
s<t
, or a markov type of assumption,
Yt+1
|=
{Ys+1,Is,As,s<t}|It,At
. Either of
them implies E£˜
²i˜
²jIiIj(Aiρi)(Ajρj)¤=0, so that Σβand cNhave the same simplified forms.
Rationale for multiple of F distribution
The distribution of the quadratic form,
n
(
¯
Xµ
)
0ˆ
Σ1
(
¯
Xµ
) con-
structed from a random sample of size
n
of N(
µ,Σ
) random variables in which
ˆ
Σ
is the sample covariance
matrix follows a Hotelling’s
T
-squared distribution. The Hotelling’s
T
-squared distribution is a multiple of the F
distribution,
d1(d1+d21)
d2Fd1,d2
in which
d1
is the dimension of
µ
, and
d2
is the sample size. Our sample sample
approximation replaces
d1
by
p
(the number of parameters in the test statistic) and
d2
by
nqp
(the sample
size minus the number of nuisance parameters minus d1).
Formula for adjusted ˆ
Wand ˆ
Q
Define a individual-specific residual vector
ˆ
e
as the
T×
1 vector with
t
th
entry
ˆ
et=Yt+1ItB0
tˆ
αIt
(
Atρt
)
Z0
tˆ
β
. For each individual define the
t
th row of the
T×
(
p+q
) individual-
specific matrix
X
by (
ItB0
t,It
(
Atρt
)
Zt
). Then define
H=X£PNX0X¤1X0
. The matrix
ˆ
Q1
is given by the
lower right
p×p
block in the inverse of
£PNX0X¤
; the matrix
ˆ
W
is given by the lower right
p×p
block in
PN£XT(IH)1ˆ
eˆ
e0(IH)1X¤.
Appendix B Further Simulations and Details
B.1 Simulation Results When Working Assumptions are True
We conduct a variety of simulations in settings in which the working assumptions hold, the scientist provides
the correct pattern for the expected availability,
τt=E
[
It
] and under the alternate, the standardized proximal
main effect is
d
(
t
)
=Z0
td
. Here we will mainly focus on the setup where the duration of the study is 42 days and
there are 5 decision times within each day, but similar results can be obtained in different setups; see below. The
randomization probability is 0.4, e.g.
ρ=ρt=P
(
At=
1)
=
0
.
4. The sample size formula is given in (8) and (9).
The test statistic is given by (6) in which
Bt
and
Zt
equal to
¡1,bt1
5c,bt1
5c2¢0
. All simulations are based on 1,000
simulated data sets. The significance level is 0.05 and the desired power is 80%.
In the first simulation, the data for each simulated subject is generated sequentially as follows. For
t=
1,. . . ,T=210, It,Atand Yt+1are generated by
ItBer
¡τt¢,AtBer
¡ρ¢
Yt+1=α(t)+(Atρ)d(t)+²t, if It=1
where
d
(
t
)
=Z0
td
and
τt
are same as in the sample size model. The conditional mean,
E
[
Yt+1|It=
1]
=α
(
t
) is
given by
α
(
t
)
=α1+α2bt1
5c+ α3bt1
5c2
, where
α1=
2
.
5,
α2=
0
.
727,
α3= −
8
.
66
×
10
4
(so that (1/
T
)
Ptα
(
t
)
α
(1)
=
1,
argmaxtα
(
t
)
=T
). We consider 5 differing distributions for the errors
{²t}T
t=1
: independent normal;
independent (scaled) Student’s
t
distribution with 3 degrees of freedom; independent (centered) exponential
distribution with
λ=
1; a Gaussian AR(1) process, e.g.
²t=φ²t1+vt
, where
vt
is white noise with variance
σ2
v
such that
Var
(
²t
)
=
1; and lastly a Gaussian AR(5) process, e.g.
²t=φ
5P5
j=1²tj+vt
, where
vt
is white
noise with variance
σ2
v
such that
Var
(
²t
)
=
1. In all cases the errors are scaled to have mean 0 and variance 1
14
(i.e.
E
[
²t|It=
1]
=
0,
Var
[
²t|At,It=
1]
=
1). Additionally four availability patterns, e.g. time varying values for
τt=E
[
It
], are considered; see Figure (1). The simulated type 1 error rate and power when the duration of study
is 42 days are reported in Table 2B and 3B. The simulation results in other setups, e.g. the length of the study is 4
week and 8 week, are reported in Table 4B. The associated sample sizes are given in Table 1B.
Since neither the working assumptions nor the inputs to the sample size formula specify the dependence of
the availability indicator,
It
on past treatment. In the second simulation, we consider the setting in which the
availability decreases as the number of treatments provided in the recent past increase. In particular, the data
are generated as follows,
ItBer
¡τt+η
5
X
j=1
(AtjItjE[AtjItj])¢,AtBer
¡ρ¢
Yt+1=α(t)+(Atρ)d(t)+²t, if It=1
Note that since we center
P5
j=1AtjItj
in the generative model of
It
, the expected availability is
τt
. The
specification of
α
(
t
),
β
(
t
) and
²t
are same as in the first simulation. The simulated type I error rate and power
are reported Table 5B.
B.2 Further Details When Working Assumptions are False
B.2.1 Working Assumption (a) is Violated.
Here we consider another setting in which the working assumption (a) is violated, e.g. the underlying true
E
[
Yt+1|It=
1] follows a non-quadratic form (recall that
Bt
is given by
¡1,bt1
5c,bt1
5c2¢0
). The data is generated
as follows
ItBer
¡τt¢,AtBer
¡ρ¢
Yt+1=α(t)+(Atρ)Z0
td+²t, if It=1
where
α
(
t
)
=E
[
Yt+1|It=
1] is provided in Figure 4. For each case,
α
(
t
) satisfies
α
(1)
=
2
.
5 and (1/
T
)
PT
t=1α
(1)
=
0
.
1. The error terms
{²t}N
t=1
are i.i.d N(0
,
1). The day of maximal proximal effect is assumed to be 29. Additionally,
different values of averaged standardized treatment effect and four patterns of availability in Figure 1 with
average 0.5 are considered. The simulation results are reported in Table 7B.
B.2.2 Additional Simulation Results When Other Working Assumptions are False
The main body of the paper reports part of the results when working assumptions (b), (c) and (d) are violated.
Additional simulation results are provided here. In particular, the simulation result is reported in Table 9B when
d
(
t
) follows other non-quadratic forms, e.g. working assumption (b) is false; see Figure 5. The simulated Type 1
error rate and power when working assumption (c) is false are reported in Table 10B. The simulated Type 1 error
rate when working assumption (d) is violated is reported in Table 11B.
B.2.3 Simulation Results when ¯
dand ¯
τare misspecified.
As discussed in the paper, the first scenario considers the setting in which the scientist provides the correct
availability pattern,
{E
[
It
]
}T
t=1
, the correct time at which the maximal standardized proximal main effect is
achieved (
argmaxtZ0
td
) and the correct initial standardized proximal main effect (
Z0
1d=d1=
0) but provides
too low a value of the averaged across time, standardized proximal main effect
¯
d=1
TPT
t=1Z0
td
. The simulated
power is provided in Table 12B. In the second scenario, the scientist provides the correct
argmaxtZ0
td
, correct
Z0
1d=d1=
0, correct
¯
d=1
TPT
t=1Z0
td
and although the scientist’s time-varying pattern of availability is correct,
the magnitude, e.g. the average availability, is underestimated. The simulation result is in Table 13B.
15
Table 1B: Sample Sizes when the proximal treatment effect satisfies
d
(
t
)
=Z0
td
. The significance
level is 0.05. The desired power is 0.80.
Duration of Study Availability Pattern Max
¯
τ= 0.5 ¯
τ= 0.7
Average Proximal Effect
0.10 0.08 0.06 0.10 0.08 0.06
4-week
Pattern 1
15 59 89 154 43 65 112
22 60 91 158 44 66 114
29 58 87 152 43 64 110
Pattern 2
15 59 89 154 43 65 112
22 60 92 159 44 67 115
29 58 89 154 43 64 111
Pattern 3
15 59 90 157 44 66 113
22 63 96 167 46 69 119
29 62 94 163 45 67 115
Pattern 4
15 59 89 155 43 65 112
22 57 86 150 43 64 110
29 54 82 142 41 61 105
6-week
Pattern 1
22 41 61 105 31 45 76
29 42 64 109 32 47 79
36 41 62 106 31 45 77
Pattern 2
22 41 61 105 31 45 76
29 43 64 110 32 47 80
36 42 62 107 31 46 77
Pattern 3
22 42 62 106 31 46 77
29 44 66 114 33 48 82
36 43 65 112 32 47 80
Pattern 4
22 41 62 106 31 45 77
29 41 62 106 31 46 78
36 40 59 101 30 44 74
8-week
Pattern 1
29 32 47 80 25 35 58
36 33 49 84 26 37 61
43 33 48 82 25 36 60
Pattern 2
29 32 47 80 25 35 58
36 34 49 84 26 37 61
43 33 49 82 25 36 60
Pattern 3
29 33 48 82 25 36 59
36 35 51 87 26 38 63
43 34 50 86 26 37 62
Pattern 4
29 33 48 81 25 36 59
36 33 49 83 25 36 61
43 32 47 80 25 35 59
“Max”is the day in which the maximal proximal effect is attained.
¯
τ=
(1/
T
)
PT
t=1E
[
It
] is the
average availability.
16
Table 2B: Simulated Type I error rate (%) when working assumptions are true. Duration of the
study is 6-week. The associated sample size is given in Table 1B.
Error Term Availability Pattern Max
¯
τ= 0.5 ¯
τ= 0.7
Average Proximal Effect
0.10 0.08 0.06 0.10 0.08 0.06
i.i.d. Normal
Pattern 1
22 3.8 4.5 4.9 4.6 5.3 4.8
29 4.7 6.0 4.6 4.0 3.2 5.0
36 5.0 5.4 4.9 4.3 4.8 4.6
Pattern 2
22 4.8 4.1 4.8 4.4 3.5 4.1
29 4.3 6.2 3.2 4.6 4.2 4.2
36 4.5 4.8 5.2 4.5 3.5 5.4
Pattern 3
22 4.7 4.5 6.3 4.4 4.9 4.9
29 4.1 5.1 4.6 4.3 6.0 5.6
36 4.7 4.4 4.6 4.1 5.1 4.4
Pattern 4
22 5.4 3.5 4.5 4.8 4.7 5.0
29 5.2 4.5 4.5 5.0 5.0 5.1
36 3.8 4.1 5.4 4.7 5.0 5.9
i.i.d. t dist. Pattern 1
22 4.3 4.4 3.2 4.1 4.1 5.2
29 5.0 3.8 3.2 3.7 4.2 6.3
36 4.3 4.5 4.0 5.0 5.7 5.4
i.i.d. Exp. Pattern 1
22 4.5 4.6 4.4 3.7 7.1 3.1
29 4.5 4.6 4.2 4.5 4.5 4.7
36 2.7 4.8 4.8 3.9 3.7 3.4
AR(1), φ=0.6 Pattern 1
22 4.3 5.3 4.6 3.8 4.2 4.0
29 4.6 5.4 5.1 4.0 4.4 4.3
36 4.7 4.0 4.0 4.1 4.2 3.9
AR(1), φ=0.3 Pattern 1
22 5.8 3.4 4.4 3.3 4.0 5.4
29 4.9 4.7 4.6 5.5 5.5 4.5
36 4.0 4.7 4.4 4.9 5.0 4.7
AR(1), φ=0.3 Pattern 1
22 4.6 4.6 4.9 4.3 5.4 4.1
29 4.8 5.3 4.1 4.3 4.2 5.2
36 3.6 3.9 4.9 4.8 4.9 4.9
AR(1), φ=0.6 Pattern 1
22 4.4 5.1 4.9 3.6 5.2 3.7
29 3.7 4.9 4.6 4.5 4.3 5.8
36 4.4 6.7 5.2 5.6 3.6 5.1
AR(5), φ=0.6 Pattern 1
22 4.4 4.7 5.1 4.2 4.5 5.5
29 4.3 5.1 4.3 3.2 3.5 4.2
36 5.3 4.5 6.1 4.2 4.6 5.4
AR(5), φ=0.3 Pattern 1
22 3.7 4.4 6.0 5.0 4.5 3.5
29 4.4 4.7 5.2 5.3 4.5 5.0
36 4.5 5.0 5.1 4.1 5.3 4.8
AR(5), φ=0.3 Pattern 1
22 5.3 4.3 5.7 4.8 4.1 4.3
29 3.9 4.8 4.1 4.0 4.3 4.9
36 4.2 5.5 5.1 3.6 4.5 3.6
AR(5), φ=0.6 Pattern 1
22 5.1 4.5 4.0 4.5 3.8 5.2
29 5.2 4.8 4.5 2.9 5.3 4.4
36 4.1 3.6 4.6 3.9 4.4 4.9
“Max”is the day in which the maximal proximal effect is attained.
¯
τ=
(1/
T
)
PT
t=1E
[
It
] is the aver-
age availability.
φ
is the parameter for AR(1) and AR(5) process. Bold numbers are significantly(at
.05 level) greater than .05.
17
Table 3B: Simulated Power(%) when working assumptions are true. Duration of the study is 6-week.
The associated sample size is given in Table 1B
Error Term Availability Pattern Max
¯
τ= 0.5 ¯
τ= 0.7
Average Proximal Effect
0.10 0.08 0.06 0.10 0.08 0.06
i.i.d. Normal
Pattern 1
22 80.9 80.0 81.0 78.7 77.5 80.7
29 78.4 80.6 77.8 80.6 78.7 79.0
36 80.2 80.0 79.6 79.4 80.2 77.0
Pattern 2
22 80.3 78.1 78.8 80.6 79.6 79.8
29 80.3 79.1 80.2 77.4 79.9 79.9
36 76.8 79.3 80.2 78.5 78.4 80.0
Pattern 3
22 83.5 81.5 77.7 78.5 81.3 78.7
29 77.9 79.1 78.5 77.8 78.8 79.0
36 77.3 78.1 79.8 79.8 79.9 79.1
Pattern 4
22 77.2 79.7 81.8 80.2 79.0 78.8
29 80.1 78.8 80.3 79.4 80.6 80.1
36 80.5 79.4 80.0 78.9 79.9 78.1
i.i.d. t dist. Pattern 1
22 80.4 81.9 81.0 79.7 79.4 80.7
29 81.7 82.2 82.2 79.1 82.3 77.3
36 80.8 78.8 79.5 81.8 81.6 79.9
i.i.d. Exp. Pattern 1
22 81.0 81.6 79.7 77.2 80.1 80.2
29 80.6 82.4 80.3 79.0 79.8 80.3
36 82.1 79.8 80.8 79.8 79.5 80.3
AR(1), φ=0.6 Pattern 1
22 78.5 80.3 78.5 82.3 79.8 80.3
29 78.7 80.8 80.0 77.1 79.5 77.9
36 77.7 80.3 80.2 78.2 77.4 83.6
AR(1), φ=0.3 Pattern 1
22 77.9 79.0 79.6 80.0 77.8 80.4
29 77.9 79.1 80.0 79.0 78.0 78.4
36 78.1 81.2 80.2 80.7 80.9 78.4
AR(1), φ=0.3 Pattern 1
22 80.2 78.5 80.8 80.5 79.6 82.6
29 78.0 80.0 80.0 78.0 79.4 80.1
36 77.6 82.5 80.6 77.0 78.9 82.0
AR(1), φ=0.6 Pattern 1
22 80.4 79.8 79.5 80.7 79.5 82.0
29 78.9 81.5 79.3 79.5 81.3 79.5
36 79.5 78.4 78.8 80.1 77.9 77.8
AR(5), φ=0.6 Pattern 1
22 79.9 79.4 80.0 78.7 79.2 79.4
29 80.0 78.3 79.1 76.8 79.6 79.3
36 80.5 80.0 79.2 80.1 78.0 80.4
AR(5), φ=0.3 Pattern 1
22 79.2 80.4 81.9 81.3 77.7 79.1
29 80.0 82.3 80.5 80.5 82.2 79.2
36 75.9 78.7 79.3 79.0 79.4 79.9
AR(5), φ=0.3 Pattern 1
22 79.4 80.8 79.8 79.5 77.3 81.2
29 78.0 79.2 79.2 79.2 80.5 78.4
36 78.3 79.1 78.1 80.7 80.5 79.5
AR(5), φ=0.6 Pattern 1
22 80.2 77.9 80.3 78.6 78.4 80.3
29 76.9 79.3 80.2 79.1 80.6 80.5
36 78.7 84.0 80.1 78.8 79.3 78.8
“Max”is the day in which the maximal proximal effect is attained.
¯
τ=
(1/
T
)
PT
t=1E
[
It
] is the aver-
age availability.
φ
is the parameter for AR(1) and AR(5) process. Bold numbers are significantly(at
.05 level) less than .80.
18
Table 4B: Simulated type 1 error rate(%) and power(%) when the duration of study is 4-week and
8-week. Error terms follow i.i.d. N(0,1). The associated sample size is given in Table 1B.
Duration of Study Availability Pattern Max
¯
τ= 0.5 ¯
τ= 0.7
Average Proximal Effect
0.10 0.08 0.06 0.10 0.08 0.06
4-week
Pattern 1
15 4.1 4.7 6.3 5.3 5.5 5.6
22 5.2 4.4 4.7 3.1 4.7 4.4
29 5.7 5.5 5.6 4.3 4.2 4.2
Pattern 2
15 4.8 4.8 5.0 5.0 5.2 5.3
22 5.1 5.2 4.7 3.7 4.2 3.7
29 5.6 5.1 4.2 4.2 4.9 4.4
Pattern 3
15 4.7 5.0 4.6 6.1 5.3 5.1
22 4.9 4.0 6.6 4.2 3.8 4.1
29 4.7 4.3 5.1 4.6 5.8 3.5
Pattern 4
15 4.9 4.6 4.8 3.0 5.9 3.8
22 3.5 5.1 4.5 5.2 3.8 6.0
29 4.4 6.4 4.7 4.4 4.3 4.7
8-week
Pattern 1
29 4.1 4.6 4.0 5.3 5.0 5.9
36 3.3 4.7 6.5 4.6 5.4 4.3
43 3.2 5.1 5.2 5.0 3.4 5.0
Pattern 2
29 3.9 5.0 4.5 4.2 3.7 4.1
36 3.8 4.6 4.9 4.5 3.4 5.2
43 3.9 5.4 5.0 3.4 3.8 5.0
Pattern 3
29 4.6 4.2 3.7 5.2 4.1 4.0
36 4.3 5.1 6.1 4.6 5.0 4.6
43 4.6 6.0 4.1 5.0 4.9 4.0
Pattern 4
29 4.5 5.2 2.9 3.6 5.3 4.4
36 4.5 5.2 3.7 2.7 3.7 4.7
43 4.2 7.1 4.9 4.4 4.5 4.8
4 week
Pattern 1
15 80.4 79.0 78.5 79.6 82.8 80.3
22 78.8 78.7 80.7 78.7 79.2 80.0
29 76.2 80.6 80.1 81.3 80.1 79.1
Pattern 2
15 82.4 77.8 77.2 75.9 80.0 78.9
22 77.2 80.3 81.5 75.8 80.7 82.0
29 80.1 79.3 80.1 78.0 77.7 76.9
Pattern 3
15 79.3 79.8 79.2 79.1 76.5 80.8
22 80.0 80.0 79.0 79.0 80.2 81.8
29 79.4 80.7 79.3 80.4 79.6 79.2
Pattern 4
15 82.6 78.3 79.2 80.5 80.0 79.5
22 80.4 80.7 79.3 79.1 78.5 79.2
29 78.4 79.2 78.5 79.6 79.2 80.5
8 week
Pattern 1
29 79.7 77.3 76.4 79.1 82.2 79.6
36 78.8 78.6 81.5 80.3 78.2 79.6
43 80.4 77.8 78.7 79.1 80.3 80.1
Pattern 2
29 79.3 81.1 79.8 78.7 79.7 80.2
36 81.2 78.5 79.0 81.3 80.8 78.2
43 80.3 81.5 77.5 75.1 78.8 78.1
Pattern 3
29 80.1 79.0 77.1 78.2 80.4 78.8
36 79.5 79.9 79.6 80.0 80.8 79.6
43 80.5 79.5 79.6 79.4 79.4 80.2
Pattern 4
29 82.1 79.7 80.7 79.7 79.0 78.4
36 77.8 78.2 80.1 77.9 76.9 79.5
43 79.6 78.5 78.1 79.4 80.6 79.5
“Max”is the day in which the maximal proximal effect is attained.
¯
τ=
(1/
T
)
PT
t=1E
[
It
] is the average
availability. Bold numbers are significantly(at .05 level) greater than .05 and less than .80.
19
Table 5B: Simulated Type 1 error rate(%) and power(%) when the availability indicator,
It
depends on the recent past
treatments with
η=
0
.
2. The expected availability is constant in
t
and equal to 0
.
5. Duration of study is 42 days. The
associated sample size is given in Table 1B.
Error
Term
φMax
¯
τ= 0.5 ¯
τ= 0.7 ¯
τ= 0.5 ¯
τ= 0.7
Average Proximal Effect
0.10 0.08 0.06 0.10 0.08 0.06 0.10 0.08 0.06 0.10 0.08 0.06
AR(1)
-0.6
22 4.8 5.4 4.5 3.4 5.8 3.7 81.5 78.0 79.4 81.7 77.9 80.7
29 4.7 4.4 4.2 4.0 4.9 4.6 79.4 80.9 80.7 78.2 79.2 79.7
36 4.3 5.3 4.4 4.2 3.9 5.5 79.5 81.5 79.8 80.2 79.2 80.7
-0.3
22 4.7 3.8 4.4 3.5 4.4 4.6 78.7 81.2 80.3 80.9 77.9 78.5
29 3.8 4.0 4.9 3.5 5.0 4.4 80.1 79.5 81.2 77.3 79.5 77.1
36 2.7 5.7 4.0 3.3 4.7 5.2 76.8 80.4 79.9 78.8 79.5 79.4
0.3
22 4.8 4.1 4.4 5.0 5.4 3.6 83.0 79.8 79.4 81.3 78.9 79.2
29 4.9 4.6 5.0 4.4 5.5 5.6 79.5 80.3 82.2 78.5 80.7 77.6
36 4.9 4.9 4.2 3.3 4.5 4.8 80.0 78.9 79.5 81.7 79.4 79.6
0.6
22 4.5 5.1 4.7 4.3 4.6 4.0 80.3 78.9 81.1 81.2 81.5 77.9
29 3.4 4.5 5.1 4.4 4.3 4.6 79.3 76.2 79.4 81.3 80.6 79.4
36 4.8 4.3 4.2 4.1 4.5 4.5 77.5 80.5 80.9 76.7 80.0 79.7
AR(5)
-0.6
22 4.8 4.6 4.3 3.7 4.7 3.5 81.9 81.4 81.6 79.8 78.3 78.9
29 6.5 4.1 4.5 3.3 4.5 4.8 77.5 79.9 79.8 79.9 79.3 79.3
36 3.5 5.7 4.4 4.6 4.7 5.7 77.8 80.8 78.6 77.9 79.2 81.7
-0.3
22 4.3 4.9 4.0 4.3 5.6 5.0 77.7 81.8 80.0 80.1 80.3 81.1
29 3.9 4.0 5.0 3.2 5.7 5.1 80.0 80.9 80.3 80.6 80.3 77.8
36 4.0 3.6 4.7 4.8 4.8 3.2 79.0 80.4 80.8 80.1 79.0 76.5
0.3
22 3.5 4.9 5.0 4.1 3.8 4.1 77.4 82.9 78.5 80.6 81.4 80.2
29 4.6 6.1 4.7 4.7 4.1 4.1 78.7 82.0 78.0 81.4 76.5 81.3
36 5.1 4.4 4.0 3.2 3.9 4.7 79.7 81.8 78.6 79.1 77.4 79.0
0.6
22 5.0 4.6 4.3 4.0 4.0 5.5 80.5 79.4 82.5 79.2 81.1 81.0
29 5.6 4.3 6.9 5.6 3.4 3.1 78.3 80.0 80.5 80.8 80.4 78.4
36 4.8 4.8 4.8 3.5 3.7 5.5 78.2 80.5 80.3 77.6 80.5 79.1
“Max”is the day in which the maximal proximal effect is attained.
¯
τ=
(1/
T
)
PT
t=1E
[
It
] is the average availability.
φ
is
the parameter for AR(1) and AR(5) process. Bold numbers are significantly(at .05 level) greater than .05 and less than
.80.
Table 6B: Simulated type I error rate(%) and power(%) when working assumption (a) is violated. Scenario 1. The
average availability is 0.5. The day of maximal proximal effect is 29.
θ¯
dAvailability Pattern
Pattern 1 Pattern 2 Pattern 3 Pattern 4 Pattern 1 Pattern 2 Pattern 3 Pattern 4
0.5 ¯
d
0.10 5.5 4.6 4.2 5.1 79.7 79.4 80.5 80.1
0.08 5.1 4.4 5.4 4.6 80.4 78.9 80.4 78.7
0.06 4.1 5.5 4.6 4.3 77.5 82.7 81.0 81.0
¯
d
0.10 4.8 4.3 3.7 4.1 79.3 78.3 77.8 79.4
0.08 5.4 4.9 4.6 5.5 78.8 79.3 78.0 80.6
0.06 4.4 3.5 5.1 4.6 78.4 79.3 79.0 80.4
1.5 ¯
d
0.10 4.4 4.1 4.4 4.8 78.3 80.5 78.4 79.9
0.08 5.0 4.3 4.3 3.9 80.5 79.7 78.7 81.9
0.06 4.0 5.1 5.5 5.6 77.2 80.8 81.6 80.3
2¯
d
0.10 4.1 3.8 5.0 5.5 77.7 78.8 79.0 78.4
0.08 4.0 5.0 3.7 5.7 79.3 81.5 79.1 79.4
0.06 4.9 4.3 5.2 5.3 80.8 79.0 77.5 80.9
¯
d=
(1/
T
)
PT
t=1Z0
td
is the average proximal effect.
θ
is the coefficient of
Wt
in
E
[
Yt+1|It=
1]. Bold Numbers are
significantly (at .05 level) greater than .05 (for type I error rate) and lower than 0.80(for power).
20
Shape 1 Shape 2 Shape 3
2.5
3.0
3.5
4.0
4.5
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time
Response
Figure 4: Conditional expectation of proximal response, E[Yt+1|It=1]. The horizontal axis is the decision time
point. The vertical axis is E[Yt+1|It=1].
Table 7B: Simulated Type 1 error rate(%) and power (%) when working assumption (a) is violated. Scenario 2.
The shapes of
α
(
t
)
=E
[
Yt+1|It=
1] and patterns of availability are provided in Figure 4 and Figure 1. The average
availability is 0.5. The day of maximal proximal effect is 29. The associated sample size is given in Table 1B.
Availability Pattern
α(t)¯
dPattern 1 Pattern 2 Pattern 3 Pattern 4 Pattern 1 Pattern 2 Pattern 3 Pattern 4
Shape 1
0.10 3.6 4.3 4.7 4.5 77.4 80.2 76.2 75.9
0.08 5.9 3.8 4.1 3.4 79.7 80.1 78.9 80.6
0.06 4.6 5.7 4.2 6.5 78.7 76.3 78.3 79.9
Shape 2
0.10 4.8 4.8 4.4 4.1 79.2 79.1 78.5 79.7
0.08 3.9 5.4 4.8 4.3 77.7 80.4 76.8 80.9
0.06 5.1 5.5 3.4 4.9 78.3 79.4 79.8 80.2
Shape 3
0.10 5.1 3.5 4.3 4.4 79.1 79.4 75.6 78.0
0.08 4.6 5.0 6.2 3.8 78.3 78.1 79.1 78.1
0.06 4.8 4.4 5.4 4.2 78.0 78.3 79.8 77.7
¯
d=
(1/
T
)
PT
t=1Z0
td
is the average standardized treatment effect. Bold Numbers are significantly (at .05 level) greater
than .05 (for type I error rate) and lower than 0.80(for power).
Maintained
Severely Degraded
Slightly Degraded
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
0.15
0.20
Max = 15
Max = 22
Max = 29
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time
Proximal Effect
Figure 5: Proximal Main Effects of Treatment,
{d
(
t
)
}T
t=1
: representing maintained, slightly degraded and severely
degraded time-varying treatment effects. The horizontal axis is the decision time point. The vertical axis is
the standardized treatment effect. The "Max" in the title refers to the day of maximal effect. The average
standardized proximal effect is 0.1 in all plots.
21
Table 8B: Sample Sizes when working assumption (b) is violated. The shape of the standardized proximal effect,
d(t)=β(t)/ ¯
σand pattern for availability, E[It] are provided in Figure 5 and in Figure (1).
¯
τ= 0.5 ¯
τ= 0.7
Availability Shape of d(t)
¯
dPattern Max Maintained Slightly
Degraded
Severely
Degraded
Maintained Slightly
Degraded
Severely
Degraded
0.10
15 43 41 39 32 31 29
Pattern 1 22 43 41 40 33 31 30
29 38 37 38 29 28 29
15 43 41 39 33 31 30
Pattern 2 22 43 42 40 33 31 30
29 38 37 38 29 28 29
15 45 43 41 33 32 31
Pattern 3 22 44 43 42 33 32 31
29 37 38 39 28 28 29
15 42 39 37 32 30 28
Pattern 4 22 44 41 39 33 31 30
29 39 38 38 29 28 28
0.08
15 65 61 58 48 45 43
Pattern 1 22 65 62 60 48 46 44
29 56 55 56 42 41 42
15 65 61 59 48 45 43
Pattern 2 22 65 62 60 48 46 44
29 56 55 56 42 41 42
15 67 64 62 49 47 45
Pattern 3 22 66 64 63 48 47 46
29 56 56 59 41 41 43
15 63 59 55 47 44 41
Pattern 4 22 65 61 58 48 45 43
29 58 56 56 43 41 41
0.06
15 111 105 100 81 76 73
Pattern 1 22 112 106 103 81 77 75
29 96 94 96 70 69 70
15 112 105 100 81 77 73
Pattern 2 22 112 106 103 81 77 75
29 96 94 96 70 68 70
15 116 111 106 83 79 76
Pattern 3 22 114 110 108 82 79 78
29 95 96 101 69 69 72
15 108 100 94 79 74 70
Pattern 4 22 112 105 99 81 76 73
29 100 95 95 72 69 70
“Max”is the day in which the maximal proximal effect is attained.
¯
d=
(1/
T
)
PT
t=1Z0
td
is the average standard-
ized treatment effect.
22
Table 9B: Simulated Power(%) when working assumption (b) is violated. The shape of the standardized
proximal effect,
d
(
t
)
=β
(
t
)/
¯
σ
and pattern for availability,
E
[
It
] are provided in Figure 5 and in Figure (1). The
corresponding sample sizes are given in Table 8B.
¯
τ= 0.5 ¯
τ= 0.7
Availability Shape of d(t)
¯
dPattern Max Maintained Slightly
Degraded
Severely
Degraded
Maintained Slightly
Degraded
Severely
Degraded
0.10
15 78.4 78.8 78.6 79.1 80.1 77.6
Pattern 1 22 80.4 79.5 81.2 80.0 76.9 77.9
29 80.4 79.2 78.9 77.3 76.8 81.1
15 78.6 79.9 79.9 80.1 80.4 81.3
Pattern 2 22 78.3 81.2 78.8 79.2 80.8 80.5
29 77.9 80.8 79.3 78.1 77.7 82.2
15 81.0 79.7 77.4 77.9 80.9 77.6
Pattern 3 22 78.9 79.1 80.0 79.7 79.4 75.9
29 80.9 77.5 77.7 80.6 79.2 78.5
15 79.7 79.5 77.9 79.5 81.7 78.0
Pattern 4 22 78.9 77.9 80.4 82.2 78.9 78.8
29 77.9 79.7 79.0 78.0 80.2 80.8
0.08
15 80.5 79.5 78.6 80.6 79.2 78.7
Pattern 1 22 78.9 78.7 78.8 78.9 80.7 80.3
29 76.6 78.0 78.3 80.9 78.6 80.4
15 81.0 79.3 78.7 82.0 80.5 80.1
Pattern 2 22 82.4 80.6 80.0 78.0 79.6 79.4
29 79.2 76.9 81.9 78.3 78.8 79.7
15 78.2 81.6 80.9 79.1 79.2 77.5
Pattern 3 22 80.9 79.5 78.6 79.2 78.3 81.4
29 80.4 79.3 77.5 77.9 80.2 82.3
15 79.4 79.4 78.1 78.6 77.4 78.8
Pattern 4 22 81.3 78.4 78.4 80.6 79.4 80.4
29 79.9 79.3 79.8 79.5 79.7 81.2
0.06
15 81.2 80.5 79.0 77.8 78.7 79.6
Pattern 1 22 80.0 81.7 79.8 80.7 80.5 80.2
29 81.2 78.7 79.2 81.2 79.7 80.1
15 78.7 77.5 81.4 80.7 81.0 80.7
Pattern 2 22 80.6 81.8 79.2 80.3 81.6 80.2
29 78.5 80.2 80.0 77.7 78.1 78.0
15 78.1 80.0 80.9 79.7 79.3 78.8
Pattern 3 22 81.2 80.2 80.0 78.3 82.2 81.1
29 79.6 81.6 79.8 80.2 81.6 76.9
15 78.2 79.8 78.9 79.5 77.3 79.2
Pattern 4 22 79.2 81.1 79.4 76.8 79.2 80.4
29 79.9 78.5 79.8 80.1 78.9 81.8
“Max”is the day in which the maximal proximal effect is attained.
¯
d=
(1/
T
)
PT
t=1Z0
td
is the average standard-
ized treatment effect. Bold numbers are significantly (at .05 level) lower than .80.
23
Table 10B: Simulated Type I error rate(%) and power(%) when working assumption (c) is violated.
The trends of
¯
σt
are provided in Figure 3. The standardized average effect is 0.1.
E
[
It
]
=
0
.
5. The
associated sample sizes are 41 and 42 when the day of maximal effect is 22 and 29.
Max = 22 Max = 29
φin AR(1) σ1t
σ0tconst. trend 1 trend 2 trend 3 const. trend 1 trend 2 trend 3
0.8 4.1 4.3 3.3 5.4 4.7 4.9 2.8 4.1
-0.6 1.0 4.6 5.0 4.0 4.4 4.4 4.8 4.2 4.3
1.2 3.8 4.5 5.2 5.5 4.3 4.1 4.5 3.8
0.8 5.2 4.7 4.0 3.4 5.4 4.9 6.2 4.5
-0.3 1.0 4.9 4.5 4.5 4.3 5.2 5.1 4.0 3.7
1.2 5.4 4.6 4.1 3.8 3.7 5.2 4.3 5.0
0.8 4.8 4.0 4.1 3.9 4.7 5.2 3.7 4.2
0 1.0 5.4 4.0 5.8 3.9 4.1 4.0 5.9 5.7
1.2 4.4 4.9 5.0 4.6 3.7 4.8 4.4 4.9
0.8 5.3 4.4 4.7 3.2 4.6 5.4 5.6 4.1
0.3 1.0 5.5 4.0 3.4 3.7 5.0 4.6 4.0 3.6
1.2 3.8 4.5 4.5 4.8 4.5 5.0 6.2 4.3
0.8 5.5 3.9 5.3 3.8 3.3 3.5 5.1 4.2
0.6 1.0 4.0 3.7 5.2 5.1 4.8 5.1 5.0 4.7
1.2 4.5 5.1 4.6 4.9 4.5 4.4 4.7 4.8
0.8 82.8 82.7 83.7 79.9 83.6 80.6 88.7 79.2
-0.6 1.0 81.1 79.1 79.9 74.8 77.7 74.3 84.8 70.4
1.2 76.6 76.3 76.3 70.6 77.6 72.0 80.7 70.4
0.8 83.0 83.0 86.0 80.3 82.7 79.2 87.9 78.0
-0.3 1.0 77.6 81.4 80.7 74.9 79.1 74.5 86.0 73.7
1.2 78.2 76.9 77.3 73.4 74.4 71.2 81.0 70.7
0.8 84.6 84.6 82.1 79.0 81.8 81.5 88.0 78.0
0 1.0 80.1 78.6 80.9 73.6 77.7 76.5 86.1 71.8
1.2 76.0 76.7 77.4 70.6 74.5 69.9 83.4 69.6
0.8 83.6 79.7 84.6 79.7 82.1 81.7 88.2 75.7
0.3 1.0 81.5 82.4 82.3 73.9 79.5 74.6 85.1 71.5
1.2 74.8 76.6 78.2 71.1 75.5 71.1 82.5 70.1
0.8 81.4 83.1 83.5 80.5 83.1 77.1 86.6 76.9
0.6 1.0 80.7 76.4 79.0 74.8 80.4 73.4 84.7 76.8
1.2 77.0 77.5 77.0 73.5 74.4 72.5 81.6 69.4
φ
is the parameter in AR(1) process for
{²t}T
t=1
. Bold numbers are significantly(at .05 level) greater
than .05 and lower than .80.
Table 11B: Simulated Type I error rate(%) when work-
ing assumption (d) is violated.
E
[
It
]
=
0
.
5. The average
effect is 0.1 and day of maximal effect is 29. N = 42.
Parameters in Itγ1
γ2-0.1 -0.2 -0.3
-0.2 5.7 3.2 3.9
η1=0.1,η2= 0.1 -0.5 3.2 4.2 4.9
-0.8 4.2 5.1 5.5
-0.2 5.4 3.8 3.9
η1=0.2,η2= 0.1 -0.5 4.4 4.4 4.8
-0.8 4.7 4.3 4.6
-0.2 4.5 5.0 5.0
η1=0.1,η2= 0.2 -0.5 4.9 3.8 6.0
-0.8 4.7 4.8 4.8
η1,η2
are parameters in generating
It
.
γ1
,
γ2
are coef-
ficients in the model of
Yt+1
. Bold Numbers are signifi-
cantly (at .05 level) greater than .05.
24
Table 12B: Degradation in power when average proximal effect is underesti-
mated. Day of maximal effect is 29 and the average availability is 0.5.
¯
din Sample
Size Formula True ¯
dAvailability Pattern
Pattern 1 Pattern 2 Pattern 3 Pattern 4
0.10 (N = 42)
0.098 76.2 78.9 77.6 78.6
0.096 75.1 74.6 78.8 74.0
0.094 73.7 70.7 75.4 73.4
0.092 71.5 71.6 73.2 71.6
0.090 68.9 68.4 69.6 67.3
0.088 65.4 65.6 66.1 65.7
0.086 66.4 67.9 65.2 66.7
0.084 62.3 63.4 63.0 59.6
0.082 60.0 60.2 60.5 58.2
0.080 58.9 59.8 57.8 61.4
0.08(N = 64)
0.078 78.2 80.2 76.8 75.8
0.076 77.3 76.7 76.2 75.4
0.074 73.1 72.2 71.2 71.4
0.072 70.7 71.0 69.4 68.2
0.070 68.2 66.0 65.2 66.1
0.068 65.5 64.3 64.6 65.7
0.066 62.8 62.3 61.8 59.4
0.064 61.9 58.5 59.5 62.1
0.062 53.9 52.6 57.0 56.9
0.060 54.6 51.1 54.8 53.4
0.06(N = 109)
0.058 75.6 76.9 74.0 78.1
0.056 73.9 73.1 73.1 72.7
0.054 68.6 71.1 69.3 68.5
0.052 65.4 69.4 63.6 66.8
0.050 61.0 62.8 64.1 63.2
0.048 57.4 58.6 56.4 56.1
0.046 53.6 53.4 52.9 54.8
0.044 52.0 48.9 50.1 53.0
0.042 45.7 43.9 44.9 46.4
0.040 40.4 42.2 42.3 42.7
Table 13B: Degradation in Power when average availability is underestimated. The day of
maximal treatment effect is attained at day 29 and the average proximal main effect is 0.1.
(1/T)PT
t=1τtin True Availability Pattern
Sample Size Formula (1/T)PT
t=1τtPattern 1 Pattern 2 Pattern 3 Pattern 4
0.5 (N = 42)
0.048 76.4 81.7 76.0 78.2
0.046 73.9 75.5 73.6 75.8
0.044 70.6 72.1 71.0 71.7
0.042 70.8 70.6 74.2 70.3
0.040 70.3 69.2 65.7 68.6
0.038 66.0 66.8 67.8 67.0
0.036 64.0 62.5 62.4 62.9
0.034 60.8 61.3 59.4 63.9
0.032 56.4 59.2 54.7 59.8
0.030 51.4 53.1 51.9 54.5
0.7 (N = 32)
0.068 79.5 76.1 79.1 75.0
0.066 77.3 75.7 74.0 76.4
0.064 74.5 74.7 73.5 77.1
0.062 73.2 73.0 75.1 72.5
0.060 69.8 70.5 73.5 72.5
0.058 71.0 69.6 71.3 67.3
0.056 68.8 70.3 66.6 64.0
0.054 68.1 65.8 65.3 68.6
0.052 62.4 64.9 65.6 62.9
0.050 60.6 63.3 62.8 61.4
25
Acknowledgment
This research was supported by NIH grants P50DA010075, R01HL12544001 and grant U54EB020404 awarded
by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) through funds provided by the
trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov).
References
1.
A. CUC C IA R E, M ., R. WEINGARDT, K., J. GREENE, C ., AN D HOFFM AN, J. Current trends in using internet and
mobile technology to support the treatment of substance use disorders. Current Drug Abuse Reviews 5, 3
(2012), 172–177.
2.
ALESSI, S. M., AN D PETRY, N . M. A randomized study of cellphone technology to reinforce alcohol abstinence
in the natural environment. Addiction 108, 5 (2013), 900–909.
3.
BOX, G. E., P.HUNTER, J. S., AN D HUNTER, W. G. Statistics for experimenters : an introduction to design,
data analysis, and model building. Wiley series in probability and mathematical statistics, 1978.
4.
BOYE R, E., FLETCHER, R., FAY, R., SM ELS ON, D., ZIEDONIS, D., A ND PICARD, R. Preliminary efforts directed
toward the detection of craving of illicit substances: The iheal project. Journal of Medical Toxicology 8, 1
(2012), 5–9.
5.
BUM AN , M., H EKLER, E., F LOEGEL, T., FLOREZ PRE GON ERO, A., G., M., A ND RI L EY, K. Step validation of the
jawbone up band in normal, overweight, and obese adults. In Proceedings of the American Medical Society
for Sports Medicine. (2014).
6.
CHA KR A BO RT Y, B., COLLINS, L. M., STRECHER, V. J., AN D MURPHY, S. A. Developing multicomponent
interventions using fractional factorial designs. Statistics in Medicine 28, 21 (2009), 2687–2708.
7. COHEN, J. Statistical Power Analysis for the Behavioral Sciences(2nd), 2nd ed. Routledge, July 1 1988.
8.
FREE, C., PHILLIPS, G., GA LLI , L., WAT SON , L., FELIX, L., EDWA R DS , P., PATEL , V., AN D HAI N ES , A. The effec-
tiveness of mobile-health technology-based health behaviour change or disease management interventions
for health care consumers: A systematic review. PLoS Med 10, 1 (01 2013), e1001362.
9.
GUSTAFSON, D., FM, M. , M, C. , A ND E T A L. A smartphone application to support recovery from alcoholism:
A randomized clinical trial. JAMA Psychiatry 71, 5 (2014), 566–572.
10. HOTE LLI NG, H. The generalization of student’s ratio. Ann. Math. Statist. 2, 3 (08 1931), 360–378.
11.
KAP LAN , R. M., AND ST ON E , A . A. Bringing the laboratory and clinic to the community: Mobile technologies
for health promotion and disease prevention. Annual Review of Psychology 64, 1 (2013), 471–498. PMID:
22994919.
12.
KING, A. C., C A ST RO, C. M., BUMA N, M. P., HEKLER, E. B., URIZ AR, G UID O G., J ., AN D AHN, D. K. Behavioral
impacts of sequentially versus simultaneously delivered dietary plus physical activity interventions: the
calm trial. Annals of Behavioral Medicine 46, 2 (2013), 157–168.
13.
KUM AR , S., N ILSEN, W., PAV EL , M., A N D SRI VA STAVA , M. Mobile health: Revolutionizing healthcare through
transdisciplinary research. Computer 46, 1 (2013), 28–35.
14.
LEW IS , M. A. , UHR I G, J . D., BA NN, C. M., HARRIS, J. L., FUR BER G, R . D., COOMES, C., AN D KUHN S, L. M.
Tailored text messaging intervention for hiv adherence: a proof-of-concept study. Health psychology :
official journal of the Division of Health Psychology, American Psychological Association 32, 3 (March 2013),
248—253.
15.
LI, P., AND RE DDE N, D. T. Small sample performance of bias-corrected sandwich estimators for cluster-
randomized trials with binary outcomes. Statistics in Medicine 34, 2 (2015), 281–296.
16.
LIA NG , K.- Y., A ND ZEG ER, S. L. Longitudinal data analysis using generalized linear models. Biometrika 73, 1
(1986), 13–22.
17.
MAN CL , L. A. , A ND DEROUEN, T. A. A covariance estimator for gee with improved small-sample properties.
Biometrics 57, 1 (2001), 126–134.
18.
MAR SC H , L . A. Leveraging technology to enhance addiction treatment and recovery. Journal of Addictive
Diseases 31, 3 (2012), 313–318. PMID: 22873192.
19.
MUESSIG, E. K., P IKE, C. E. , LEGRA ND, S. , A ND HI G HT OW-WEID MAN , B. L. Mobile phone applications for
the care and prevention of hiv and other sexually transmitted diseases: A review. J Med Internet Res 15, 1
(Jan 2013), e1.
20.
NILSEN, W., KU MA R , S. , SHA R , A. , VA RO QU IER S, C., WI L EY, T., RIL EY, W. T. , PAV EL , M., A N D ATIENZA, A. A .
Advancing the science of mhealth. Journal of Health Communication 17, sup1 (2012), 5–10.
26
21.
QUAN B EC K , A. , GUS TA FS O N, D., MAR SCH , L., MCTAVIS H, F., BR OW N, R., MAR ES, M.-L., JOHNSON, R.,
GLA SS, J., ATW OO D, A., AN D MCDOWELL, H. Integrating addiction treatment into primary care using mobile
health technology: protocol for an implementation research study. Implementation Science 9, 1 (2014), 65.
22.
ROBINS, J. A new approach to causal inference in mortality studies with a sustained exposure period—
application to control of the healthy worker survivor effect. Mathematical Modelling 7, 9–12 (1986), 1393 –
1512.
23.
ROB IN S , J . Addendum to “a new approach to causal inference in mortality studies with a sustained exposure
period—application to control of the healthy worker survivor effect”. Computers and Mathematics with
Applications 14, 9–12 (1987), 923 – 945.
24.
ROBINS, J. M. Optimal structural nested models for optimal sequential decisions. In Proceedings of the
Second Seattle Symposium on Biostatistics (New York, 2004), D. Y. Lin and P. Heagerty, Eds., Springer, pp. 189–
326.
25.
RUB IN , D. B. Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6, 1 (01 1978),
34–58.
26.
SPRUI JT-METZ, D. , AND NILSEN, W. Dynamic models of behavior for just-in-time adaptive interventions.
Pervasive Computing, IEEE 13, 3 (July 2014), 13–17.
27.
TU, X. M., KOWA LS K I, J ., ZH A NG , J., LY NCH , K. G., A ND CR I TS -CH RIS TO PH, P. Power analyses for longitudinal
trials and other clustered designs. Statistics in Medicine 23, 18 (2004), 2799–2815.
28.
WAN G, L ., ROTNITZKY, A., LIN, X., MILLIKAN, R. E., A ND THA LL, P. F. Evaluation of viable dynamic treatment
regimes in a sequentially randomized trial of advanced prostate cancer. Journal of the American Statistical
Association 107, 498 (2012), 493–508.
27
... Despite the abundance of mHealth applications, the majority are not customized to the individual user and have not been evaluated for validity and efficacy. [15][16][17] As a result, digital health technology can benefit from the integration of evidence-based adaptive design. ...
... Currently, adaptive and in-the-moment interventions are primarily based on preprogrammed decision rules developed from domain knowledge. [16][17][18] Rather than using predefined rules to dictate the delivery of interventions, the precision of behavior change technologies can be improved with ML. For example, a smoking cessation program may trigger an intervention when wrist sensors detect that the patient is smoking. ...
... Suppose the intervention consists of multiple components-for example, to promote physical activity-the application can include both an "evening component" aimed at helping the user plan next-day physical activity as well as a "throughout the day component" involving brief, tailored physical activity suggestions. 16 The evening component might be randomized each evening between 2 intervention options: a message to prompt physical activity planning for the following day and no message. The day component might be randomized multiple times per day between a tailored activity suggestion and no message. ...
Article
Chronic disease now affects approximately half of the US population, causes 7 in 10 deaths, and accounts for roughly 80% of US health care expenditure. Because the root causes of chronic diseases are largely behavioral, effective therapies require frequent, individualized interventions that extend beyond the hospital and clinic to reach patients in their day-to-day lives. However, a mismatch currently exists between what the health care system is equipped to provide and the interventions necessary to effectively address the chronic disease burden. To remedy this health crisis, we present an individualized, data-driven digital approach for chronic disease management and prevention through precision behavior change. The rapid growth of information, biological, and communication technologies makes this an opportune time to develop digital tools that deliver precision interventions for health behavior change to address the chronic disease crisis. Building on this rapid growth, we propose a framework that includes the precise targeting of risk-producing behaviors using real-time sensing technology, machine learning data analysis to identify the most effective intervention, and delivery of that intervention with health-reinforcing feedback to provide real-time, individualized support to empower sustainable health behavior change.
... Future research may consider semi-regular non-intervention periods or micro-randomized designs to properly calibrate algorithms. 46 Fourth, inherent in the use of machine learning is the "black box" element in which the data guide intervention provision rather than a priori defined evidence-based theoretical models. 47 Future studies should consider different statistical procedures and methodologies (such as micro-randomized designs or control systems engineering) to develop and validate a theoretical model for momentary interventions on eating behaviors. ...
... 47 Future studies should consider different statistical procedures and methodologies (such as micro-randomized designs or control systems engineering) to develop and validate a theoretical model for momentary interventions on eating behaviors. 46,48 ...
Article
Full-text available
Suboptimal weight losses are partially attributable to lapses from a prescribed diet. We developed an app (OnTrack) that uses ecological momentary assessment to measure dietary lapses and relevant lapse triggers and provides personalized intervention using machine learning. Initially, tension between user burden and complete data was resolved by presenting a subset of lapse trigger questions per ecological momentary assessment survey. However, this produced substantial missing data, which could reduce algorithm performance. We examined the effect of more questions per ecological momentary assessment survey on algorithm performance, app utilization, and behavioral outcomes. Participants with overweight/obesity ( n = 121) used a 10-week mobile weight loss program and were randomized to OnTrack-short (i.e. 8 questions/survey) or OnTrack-long (i.e. 17 questions/survey). Additional questions reduced ecological momentary assessment adherence; however, increased data completeness improved algorithm performance. There were no differences in perceived effectiveness, app utilization, or behavioral outcomes. Minimal differences in utilization and perceived effectiveness likely contributed to similar behavioral outcomes across various conditions.
... Currently, Randomized Controlled Trials (RCTs) are not well matched to the pace of technology development, and there is a need for methodologies that can account for effects more rapidly [80]. One recent methodological approach that is aimed at meeting evidence demands from medicine, simultaneously to being relevant to methodological requirements from technology development is found in micro-randomized trials [81] Micro-RCTs aims to provide a data-based method for evaluating online interventions by providing an experimental design for use in testing the proximal effects of the newly developed treatments. ...
Article
Full-text available
Internet-Delivered Psychological Treatments (IDPT) are based on evidence-based psychological treatment models adjusted for interaction through the Internet. The use of Internet technologies has the potential to increase the availability of evidence-based mental health services for a far-reaching population with the use of fewer resources. Despite evidence that Internet Interventions can be effective means in mental health morbidities, most current IDPT systems are tunnel-based, inflexible, and non-interoperable. Hence it becomes essential to understand which elements of an Internet intervention contribute to effectiveness and treatment outcomes. By analogy, adaptation is a central aspect of successful face-to-face mental health therapy. Adaptability to patient needs can be regarded as an essential outcome factor in online systems for mental health interventions as well. While some aspects of rule-based and machine-learning-based adaptation have attracted attention in recent IDPT development, systematic reporting of core components, dimensions of adaptiveness, information architecture, and strategies for adaptation in the IDPT system are still lacking. To bridge this gap, we propose a model that shows how adaptive systems are represented in classical control theory and discuss how the model can be used to specify adaptive IDPT systems. Concerning the reference model, we outline the core components of adaptive IDPT systems, the main adaptive elements, dimensions of adaptiveness, information architecture applied to adaptive systems, and strategies used in the adaptation process. We also provide comprehensive guidelines on how to develop an adaptive IDPT system based on the Person-Based Approach.
... Shifts at the Food and Drug Administration (FDA) related to evidence and the increasingly broader array of methods being advanced within the behavioral medicine community, such as factorial trials [22], sequential multiple assignment trials [23], microrandomization trials [24], hybrid clinical trials [25], N-of-1 crossover designs [26], and control optimization trials [27], all point to a richer array of methods [28,29] that can be used to answer the question "does it work?" in ways that align with the inherent complexity of digital health. This tension, thus, points to the important role that behavioral scientists have in helping insurance groups and others, such as the FDA, to understand these differing methods and, through the process of triangulation, develop refined practices for defining if a tool is evidence based. ...
Article
Full-text available
Digital health promises to increase intervention reach and effectiveness for a range of behavioral health outcomes. Behavioral scientists have a unique opportunity to infuse their expertise in all phases of a digital health intervention, from design to implementation. The aim of this study was to assess behavioral scientists’ interests and needs with respect to digital health endeavors, as well as gather expert insight into the role of behavioral science in the evolution of digital health. The study used a two-phased approach: (a) a survey of behavioral scientists’ current needs and interests with respect to digital health endeavors (n = 346); (b) a series of interviews with digital health stakeholders for their expert insight on the evolution of the health field (n = 15). In terms of current needs and interests, the large majority of surveyed behavioral scientists (77%) already participate in digital health projects, and from those who have not done so yet, the majority (65%) reported intending to do so in the future. In terms of the expected evolution of the digital health field, interviewed stakeholders anticipated a number of changes, from overall landscape changes through evolving models of reimbursement to more significant oversight and regulations. These findings provide a timely insight into behavioral scientists’ current needs, barriers, and attitudes toward the use of technology in health care and public health. Results might also highlight the areas where behavioral scientists can leverage their expertise to both enhance digital health’s potential to improve health, as well as to prevent the potential unintended consequences that can emerge from scaling the use of technology in health care.
... We also emphasize the significance of our methodology for adaptive randomized trials within a single unit, which are tailored to approximate an optimal treatment rule as sample size grows. Literature on single-unit adaptive sequential trials is almost non-existent to our knowledge, except for the ground-breaking work by Murphy et al [6,10,20,22,24,33]. We aim to build on these ideas in this manuscript, by providing model-free efficient estimators of causal effects based on single-subject interventions on the corresponding unit-level outcome. ...
Preprint
Full-text available
Consider the case that one observes a single time-series, where at each time t one observes a data record O(t) involving treatment nodes A(t), possible covariates L(t) and an outcome node Y(t). The data record at time t carries information for an (potentially causal) effect of the treatment A(t) on the outcome Y(t), in the context defined by a fixed dimensional summary measure Co(t). We are concerned with defining causal effects that can be consistently estimated, with valid inference, for sequentially randomized experiments without further assumptions. More generally, we consider the case when the (possibly causal) effects can be estimated in a double robust manner, analogue to double robust estimation of effects in the i.i.d. causal inference literature. We propose a general class of averages of conditional (context-specific) causal parameters that can be estimated in a double robust manner, therefore fully utilizing the sequential randomization. We propose a targeted maximum likelihood estimator (TMLE) of these causal parameters, and present a general theorem establishing the asymptotic consistency and normality of the TMLE. We extend our general framework to a number of typically studied causal target parameters, including a sequentially adaptive design within a single unit that learns the optimal treatment rule for the unit over time. Our work opens up robust statistical inference for causal questions based on observing a single time-series on a particular unit.
... Currently, behavioral theory and within-subject randomization methods (e.g., microrandomized trials) are used to identify appropriate tailoring variables and decision rules for JITAI (Liao, Klasnja, Tewari, & Murphy, 2015). Although we did not randomize participants to specific contexts in the present study, since such a design would be implausible, our within-subjects design allowed us to estimate the proximal association of various contextual factors with smoking. ...
Article
Full-text available
The present study provides detailed contextual information about smoking habits among young Korean American smokers with the goal of characterizing situations where they are most at risk for smoking. Relevant situational factors included location, social context, concurrent activities, time of day, affective states, and food and beverage consumption. Using ecological momentary assessment (EMA) over 7 days, participants (N = 78) were instructed to respond to smoking prompts (n = 2614) and non-smoking prompts (n = 2136) randomly scheduled throughout the day. At each prompt, participants completed a short survey about immediate contextual factors. We used multilevel models to evaluate the association between contextual factors and smoking and further explored the distribution of smoking locations and concurrent activities across each social context and reason for smoking. Compared to non-smoking events, smoking events were associated with being outside, the presence of Korean friends, socializing, consuming alcohol, and experiencing more stress relative to one’s average stress level (all ps < .01). Further analyses involving only smoking events showed that when participants smoked alone, they were most commonly at home (50 %) and most often studying/working (28 %). When smoking with Korean friends, participants were most often outside (38 %) and socializing (54 %). When smoking to reduce craving, participants were most often at home (39 %) and studying/working (25 %). To our knowledge, this is the first study to provide detailed descriptions of real-time smoking contexts among young Korean American smokers. Information with this level of granularity is needed to develop effective just-in-time adaptive interventions (JITAIs) for smoking cessation.
... It does not directly provide an efficacious intervention, which requires making choices on not only the timing of delivery, but also the right content, the adaptation mechanisms for personalizing it to the individual and the user's context, and selecting the right modality for delivery (e.g., on the phone, on a smart watch). Conducting a micro-randomized trial [49] could be a natural next step to determine the most efficacious strategy for personalized JI-TIs. Several populations can be targeted for stress JITI where stress plays a significant role. ...
Conference Paper
Full-text available
Management of daily stress can be greatly improved by delivering sensor-triggered just-in-time interventions (JITIs) on mobile devices. The success of such JITIs critically depends on being able to mine the time series of noisy sensor data to find the most opportune moments. In this paper, we propose a time series pattern mining method to detect significant stress episodes in a time series of discontinuous and rapidly varying stress data. We apply our model to 4 weeks of physiological, GPS, and activity data collected from 38 users in their natural environment to discover patterns of stress in real life. We find that the duration of a prior stress episode predicts the duration of the next stress episode and stress in mornings and evenings is lower than during the day. We then analyze the relationship between stress and objectively rated disorder in the surrounding neighborhood and develop a model to predict stressful episodes.
... The goal, nonetheless, is to establish the efficacy of a treatment or intervention in a given cohort, and not at the personalized level. The recently proposed micro-randomization designs[26, 12] generalize single-case designs by allowing more traditional statistical analysis of multiple participants concomitantly. Micro-randomization designs adopt a potential outcomes framework and allow the causal inference of proximal time dependent causal effects of just in time mobile interventions. ...
Article
Mobile health studies can leverage longitudinal sensor data from smartphones to guide the application of personalized medical interventions. These studies are particularly appealing due to their ability to attract a large number of participants. In this paper, we argue that the adoption of an instrumental variable approach for randomized trials with imperfect compliance provides a natural framework for personalized causal inference of medication response in mobile health studies. Randomized treatment suggestions can be easily delivered to the study participants via electronic messages popping up on the smart-phone screen. Under quite general assumptions and as long as there is some degree of compliance between the randomized suggested treatment and the treatment effectively adopted by the study participant, we can identify the causal effect of the actual treatment on the response in the presence of unobserved confounders. We implement a personalized randomization test for testing the null hypothesis of no causal effect of the treatment on the response, and evaluate its performance in a large scale simulation study encompassing data generated from linear and non-linear time series models under several simulation conditions. In particular, we evaluate the empirical power of the proposed test under varying degrees of compliance between the suggested and actual treatment adopted by the participant. Our empirical investigations provide encouraging results in terms of power and control of type I error rates.
Article
The micro-randomized trial is a sequential randomized experimental design to empirically evaluate the effectiveness of mobile health intervention components that may be delivered at hundreds or thousands of decision points. Micro-randomized trials have motivated a new class of causal estimands, termed causal excursion effects, for which semiparametric inference can be conducted via a weighted, centred least-squares criterion (Boruvka et al., 2018). Causal excursion effects allow health scientists to answer important scientific questions about how intervention effectiveness may change over time or may be moderated by individual characteristics, time-varying context or past responses. Existing definitions and associated methods assume between-subject independence and noninterference. Deviations from these assumptions often occur. In this paper, causal excursion effects are revisited under potential cluster-level treatment effect heterogeneity and interference, where the treatment effect of interest may depend on cluster-level moderators. Utility of the proposed methods is shown by analysing data from a multi-institution cohort of first-year medical residents in the United States.
Chapter
Non-adherence to a drug therapy is often the reason for not achieving the therapeutic goals in patients. Thus, measuring and monitoring drug adherence is an important aspect to understand patients’ adherence patterns and behavior as well as to provide supportive measures to enhance or reestablish adherence to a prescribed regimen. A variety of different Adherence Measurement and Monitoring Systems (AMS) exist although there is no single AMS or method considered to be the gold standard today. These range from simple Apps that issue alerts and reminders to patients up to AMS that facilitate automated, telemedical interactions between the physician and the patient to initiate corrective interventions by making use of a variety of data sources. When applied to patients with several morbidities, co-morbidities, and disabilities appropriate AMS still remain a challenge.
Article
Full-text available
The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster-randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias-corrected sandwich estimators. Our results suggest that the GEE Wald z-test should be avoided in the analyses of CRTs with few clusters even when bias-corrected sandwich estimators are used. With t-distribution approximation, the Kauermann and Carroll (KC)-correction can keep the test size to nominal levels even when the number of clusters is as low as 10 and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)-correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t-test and KC-correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes because of fewer assumptions and robustness to the misspecification of the covariance structure. Copyright © 2014 John Wiley & Sons, Ltd.
Article
Full-text available
Background Healthcare reform in the United States is encouraging Federally Qualified Health Centers and other primary-care practices to integrate treatment for addiction and other behavioral health conditions into their practices. The potential of mobile health technologies to manage addiction and comorbidities such as HIV in these settings is substantial but largely untested. This paper describes a protocol to evaluate the implementation of an E-Health integrated communication technology delivered via mobile phones, called Seva, into primary-care settings. Seva is an evidence-based system of addiction treatment and recovery support for patients and real-time caseload monitoring for clinicians. Methods/Design Our implementation strategy uses three models of organizational change: the Program Planning Model to promote acceptance and sustainability, the NIATx quality improvement model to create a welcoming environment for change, and Rogers’s diffusion of innovations research, which facilitates adaptations of innovations to maximize their adoption potential. We will implement Seva and conduct an intensive, mixed-methods assessment at three diverse Federally Qualified Healthcare Centers in the United States. Our non-concurrent multiple-baseline design includes three periods — pretest (ending in four months of implementation preparation), active Seva implementation, and maintenance — with implementation staggered at six-month intervals across sites. The first site will serve as a pilot clinic. We will track the timing of intervention elements and assess study outcomes within each dimension of the Reach, Effectiveness, Adoption, Implementation, and Maintenance framework, including effects on clinicians, patients, and practices. Our mixed-methods approach will include quantitative (e.g., interrupted time-series analysis of treatment attendance, with clinics as the unit of analysis) and qualitative (e.g., staff interviews regarding adaptations to implementation protocol) methods, and assessment of implementation costs. Discussion If implementation is successful, the field will have a proven technology that helps Federally Qualified Health Centers and affiliated organizations provide addiction treatment and recovery support, as well as a proven strategy for implementing the technology. Seva also has the potential to improve core elements of addiction treatment, such as referral and treatment processes. A mobile technology for addiction treatment and accompanying implementation model could provide a cost-effective means to improve the lives of patients with drug and alcohol problems. Trial registration ClinicalTrials.gov (NCT01963234).
Article
Full-text available
Importance Patients leaving residential treatment for alcohol use disorders are not typically offered evidence-based continuing care, although research suggests that continuing care is associated with better outcomes. A smartphone-based application could provide effective continuing care.Objective To determine whether patients leaving residential treatment for alcohol use disorders with a smartphone application to support recovery have fewer risky drinking days than control patients.Design, Setting, and Participants An unmasked randomized clinical trial involving 3 residential programs operated by 1 nonprofit treatment organization in the Midwestern United States and 2 residential programs operated by 1 nonprofit organization in the Northeastern United States. In total, 349 patients who met the criteria for DSM-IV alcohol dependence when they entered residential treatment were randomized to treatment as usual (n = 179) or treatment as usual plus a smartphone (n = 170) with the Addiction–Comprehensive Health Enhancement Support System (A-CHESS), an application designed to improve continuing care for alcohol use disorders.Interventions Treatment as usual varied across programs; none offered patients coordinated continuing care after discharge. A-CHESS provides monitoring, information, communication, and support services to patients, including ways for patients and counselors to stay in contact. The intervention and follow-up period lasted 8 and 4 months, respectively.Main Outcomes and Measures Risky drinking days—the number of days during which a patient’s drinking in a 2-hour period exceeded 4 standard drinks for men and 3 standard drinks for women, with standard drink defined as one that contains roughly 14 g of pure alcohol (12 oz of regular beer, 5 oz of wine, or 1.5 oz of distilled spirits). Patients were asked to report their risky drinking days in the previous 30 days on surveys taken 4, 8, and 12 months after discharge from residential treatment.Results For the 8 months of the intervention and 4 months of follow-up, patients in the A-CHESS group reported significantly fewer risky drinking days than did patients in the control group, with a mean of 1.39 vs 2.75 days (mean difference, 1.37; 95% CI, 0.46-2.27; P = .003).Conclusions and Relevance The findings suggest that a multifeatured smartphone application may have significant benefit to patients in continuing care for alcohol use disorders.Trial Registration clinicaltrials.gov Identifier: NCT01003119
Article
Full-text available
Mobile health (mHealth) seeks to improve individuals' health and well-being by continuously monitoring their status, rapidly diagnosing medical conditions, recognizing behaviors, and delivering just-in-time interventions, all in the user's natural mobile environment. The Web extra at http://youtu.be/o2mieSywutY is an audio interview in which Santosh Kumar, Wendy Nilsen, and Mani Srivastava discuss the path toward realizing mobile health systems.
Article
Full-text available
Mobile technologies could be a powerful media for providing individual level support to health care consumers. We conducted a systematic review to assess the effectiveness of mobile technology interventions delivered to health care consumers. We searched for all controlled trials of mobile technology-based health interventions delivered to health care consumers using MEDLINE, EMBASE, PsycINFO, Global Health, Web of Science, Cochrane Library, UK NHS HTA (Jan 1990-Sept 2010). Two authors extracted data on allocation concealment, allocation sequence, blinding, completeness of follow-up, and measures of effect. We calculated effect estimates and used random effects meta-analysis. We identified 75 trials. Fifty-nine trials investigated the use of mobile technologies to improve disease management and 26 trials investigated their use to change health behaviours. Nearly all trials were conducted in high-income countries. Four trials had a low risk of bias. Two trials of disease management had low risk of bias; in one, antiretroviral (ART) adherence, use of text messages reduced high viral load (>400 copies), with a relative risk (RR) of 0.85 (95% CI 0.72-0.99), but no statistically significant benefit on mortality (RR 0.79 [95% CI 0.47-1.32]). In a second, a PDA based intervention increased scores for perceived self care agency in lung transplant patients. Two trials of health behaviour management had low risk of bias. The pooled effect of text messaging smoking cessation support on biochemically verified smoking cessation was (RR 2.16 [95% CI 1.77-2.62]). Interventions for other conditions showed suggestive benefits in some cases, but the results were not consistent. No evidence of publication bias was demonstrated on visual or statistical examination of the funnel plots for either disease management or health behaviours. To address the limitation of the older search, we also reviewed more recent literature. Text messaging interventions increased adherence to ART and smoking cessation and should be considered for inclusion in services. Although there is suggestive evidence of benefit in some other areas, high quality adequately powered trials of optimised interventions are required to evaluate effects on objective outcomes. Please see later in the article for the Editors' Summary.
Article
In observational cohort mortality studies with prolonged periods of exposure to the agent under study, it is not uncommon for risk factors for death to be determinants of subsequent exposure. For instance, in occupational mortality studies date of termination of employment is both a determinant of future exposure (since terminated individuals receive no further exposure) and an independent risk factor for death (since disabled individuals tend to leave employment). When current risk factor status determines subsequent exposure and is determined by previous exposure, standard analyses that estimate age-specific mortality rates as a function of cumulative exposure may underestimate the true effect of exposure on mortality whether or not one adjusts for the risk factor in the analysis. This observation raises the question, which if any population parameters can be given a causal interpretation in observational mortality studies?In answer, we offer a graphical approach to the identification and computation of causal parameters in mortality studies with sustained exposure periods. This approach is shown to be equivalent to an approach in which the observational study is identified with a hypothetical double-blind randomized trial in which data on each subject's assigned treatment protocol has been erased from the data file. Causal inferences can then be made by comparing mortality as a function of treatment protocol, since, in a double-blind randomized trial missing data on treatment protocol, the association of mortality with treatment protocol can still be estimated.We reanalyze the mortality experience of a cohort of arsenic-exposed copper smelter workers with our method and compare our results with those obtained using standard methods. We find an adverse effect of arsenic exposure on all-cause and lung cancer mortality which standard methods fail to detect.
Article
This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for niultivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the pioposecl estimators in two simple situations is considered. The approach is closely related to quasi-likelihood.
Article
Background Few studies have evaluated how to combine dietary and physical activity (PA) interventions to enhance adherence. Purpose We tested how sequential versus simultaneous diet plus PA interventions affected behavior changes. Methods Two hundred participants over age 44 years not meeting national PA and dietary recommendations (daily fruit and vegetable servings and percent of calories from saturated fat) were randomized to one of four 12-month telephone interventions: sequential (exercise first or diet first), simultaneous, or attention control. At 4 months, the other health behavior was added in the sequential arms. Results Ninety-three percent of participants were retained through 12 months. At 4 months, only exercise first improved PA, and only the simultaneous and diet-first interventions improved dietary variables. At 12 months, mean levels of all behaviors in the simultaneous arm met recommendations, though not in the exercise- and diet-first arms. Conclusions We observed a possible behavioral suppression effect of early dietary intervention on PA that merits investigation.