ArticlePDF Available

Micro-Randomized Trials in mHealth

April 2015

Source
arXiv

Authors:

Peng Liao

Chinese Academy of Sciences

Ambuj Tewari

University of Michigan

The use and development of mobile interventions is experiencing rapid growth. In "just-in-time" mobile interventions, treatments are provided via a mobile device that are intended to help an individual make healthy decisions" in the moment," and thus have a proximal, near future impact. Currently the development of mobile interventions is proceeding at a much faster pace than that of associated data science methods. A first step toward developing data-based methods is to provide an experimental design for use in testing the proximal effects of these just-in-time treatments. In this paper, we propose a "micro-randomized" trial design for this purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a treatment might be provided. Further, we develop a test statistic for assessing the proximal effect of a treatment as well as an associated sample size calculator. We conduct simulation evaluations of the sample size calculator in various settings. Rules of thumb that might be used in designing the micro-randomized trial are discussed. This work is motivated by our collaboration on the HeartSteps mobile application designed to increase physical activity.

Availability Patterns. The x-axis is decision time point and y-axis is the expected availability. Pattern 2 represents availability varying by day of the week with higher availability on the weekends and lower mid-week. The average availability is 0.5 in all cases.

…

Figures - uploaded by Ambuj Tewari

Content may be subject to copyright.

Content uploaded by Ambuj Tewari

Content may be subject to copyright.

Micro-Randomized Trials in mHealth

Peng Liao ∗1, Predrag Klasnja2, Ambuj Tewari1, and Susan A. Murphy1

1Department of Statistics, University of Michigan, Ann Arbor, MI 48109

2School of Information, University of Michigan, Ann Arbor, MI 48109

April 7, 2015

Abstract

The use and development of mobile interventions is experiencing rapid growth. In “just-in-time” mobile

interventions, treatments are provided via a mobile device that are intended to help an individual make healthy

decisions “in the moment,” and thus have a proximal, near future impact. Currently the development of mobile

interventions is proceeding at a much faster pace than that of associated data science methods. A ﬁrst step

toward developing data-based methods is to provide an experimental design for use in testing the proximal

effects of these just-in-time treatments. In this paper, we propose a “micro-randomized” trial design for this

purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the

study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a

treatment might be provided. Further, we develop a test statistic for assessing the proximal effect of a treatment

as well as an associated sample size calculator. We conduct simulation evaluations of the sample size calculator

in various settings. Rules of thumb that might be used in designing the micro-randomized trial are discussed.

This work is motivated by our collaboration on the HeartSteps mobile application designed to increase physical

activity.

Key words: Mirco-randomized Trial, Sample Size Calculation, mHealth

1 Introduction

The use and development of mobile interventions is experiencing rapid growth. Mobile interventions are

used across the health ﬁelds and include treatments used to improve HIV medication adherence [

], to

improve activity [

], accompany counseling/pharmacotherapy in substance use [

], reinforce abstinence in

addictions [

] and to support recovery from alcohol dependence [

]. Mobile interventions in maintaining

adherence to anti-retroviral therapy and smoking cessation have shown sufﬁcient effectiveness and replicability

in trials and thus have been recommended for inclusion in health services [8].

However as Nilsen et al. [

] state “In fact, the development of mHealth technologies is currently progressing

at a much faster pace than the science to evaluate their validity and efﬁcacy, introducing the risk that ineffective

or even potentially harmful or iatrogenic applications will be implemented.”Indeed reviews, while reporting pre-

liminary evidence of effectiveness, call for more programmatic, data-based approaches to constructing mobile

interventions [

]. In particular these reviews call for research that focuses on data-informed development

of these complex multi-component interventions prior to their evaluation in standard randomized controlled

trials. But methods for using data to inform the design and evaluation of adaptive mobile interventions have

lagged behind the use and deployment of these interventions [13, 20, 26].

Many mobile interventions are designed to be “just-in-time” interventions, meaning that they intend to

provide treatments that help an individual make healthy decisions in the moment, such as engaging in a

desirable behavior (e.g., taking a medication on time) or effectively coping with a stressful situation. As such,

mobile interventions are often intended to have proximal, near-term effects. A ﬁrst approach toward developing

data-based methods for evaluation of mobile health interventions is to provide an experimental design for use

in testing the proximal effects of the treatments. This paper proposes a micro-randomized trial design for this

purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the

study, with the result that each participant may be randomized at the hundreds or thousands of occasions at

which a treatment might be provided. This repeated randomization of treatments under investigation enables

causal modeling of each treatment’s time-varying proximal effect as well as modeling of time-varying effect

∗Corresponding author. 439 WestHall, 1085 South University Ave, Ann Arbor, MI 48109. Email:pengliao@umich.edu

arXiv:1504.00238v1 [stat.ME] 1 Apr 2015

moderation. Thus, the micro-randomized trial can be seen as a ﬁrst experimental step in the development

of effective mobile interventions that are composed of sequences of treatments. We propose to size the trial

to detect the proximal main effect of the treatments. This is akin to the use of factorial designs for use in

constructing multi-component interventions. In these factorial designs [

], a ﬁrst analysis often involves

testing if the main effect of each treatment is equal to 0.

This work is motivated by our collaboration on the HeartSteps mobile application for increasing physical

activity, which we will use to illustrate our discussion. One of the treatments in HeartSteps is suggestions for

physical activity which are tailored to the person’s current context. HeartSteps can deliver these suggestions

at any of the ﬁve time intervals during the day, which correspond roughly to morning commute, mid-day,

mid-afternoon, evening commute, and post-dinner times. When a suggestion is delivered, the user’s phone

plays a notiﬁcation sound, vibrates and lights up, and the suggestion is displayed on the lock screen of the phone.

These suggestions encourage activity in the current context and are intended to have an effect (getting a person

to walk) within the next hour.

In the following section, we introduce the micro-randomized trial design. In section 3 we precisely deﬁne

the proximal main effect of a treatment, using the language of potential outcomes. We develop the test statistic

for assessing the proximal effect of a treatment as well as an associated sample size calculator in section 4 and 5.

Next we provide simulation evaluation of the sample size calculator. We end, in Section 7, with a discussion.

2 Micro-Randomized Trial

In general an individual’s longitudinal data, recorded via mobile devices that sense and provide treatments, can

be written as

{S0,S1,A1,S2,A2,. . . ,St,At, . . ., ST,AT,ST+1}

where,

indexes decision times,

is a vector of baseline information (gender, ethnicity, etc.) and

(

t≥

1) is

information collected between time

t−

1 and

(e.g. summary measures of recent activity levels, engagement,

and burden; day of week; weather; busyness indicated by smartphone calendar, etc.). The treatment at time

denoted by

; throughout this paper we consider binary options for the treatments (e.g., the treatment is on

or off). The proximal response, denoted by

Yt+1

, is a known function of

{St,At,St+1}

. Here we assume that the

longitudinal data are independent and identically distributed across

individuals. Note that this assumption

would be violated, if for example, some of the treatments are used to enhance social support between individuals

in the study.

In HeartSteps, data (

) is collected both passively via sensors and via participant self-report. Each participant

is provided a “Jawbone” band [

], worn at the wrist, which collects daily step count and the amount of sleep the

user had the previous night. Furthermore sensors on the phone are used to collect a variety of information at

each of the 5 time points during the day, including the time-stamp, location, busyness of planned activities on

the phone calendar and other activity on the phone. Each evening, self-report data is collected including utility

and burden ratings. The proximal response,

Yt+1

, for activity suggestions is the step count in the hour following

time t.

A decision time is a point in time at which—based on participant’s current state, past behavior, or current

context—treatment may need to be delivered. Decision times vary by the nature of the intervention component.

In HeartSteps, the decision times for activity suggestions are 5 times per day over the 42 day study duration.

For an alcohol-recovery application that provides an intervention when an individual goes within 10 feet of a

high risk location (e.g. a liquor store), decision points might be every 2 minutes, the frequency at which the

application would get the person’s current location and assess whether she is close to a high-risk location. In

a long-term study of an intervention for multiple health behaviors, the decision points might be weekly or

monthly at which times, decisions are made regarding whether to change the focus from one behavior (e.g.,

physical activity) to another (e.g., diet). Finally, in many studies there is an option for an individual to press a

"panic”button, indicating the need for help; for such interventions, decision times correspond to times at which

the panic button might be pressed.

A micro-randomized trial is a trial in which at each decision time

, participants are randomized to a

treatment option, denoted by

. Treatment options may correspond to whether or not a treatment is provided

at a decision time; for example in HeartSteps, whether or not the individual is provided a lock-screen activity

suggestion. Or treatment options may be alternative types of treatment that can be provided at the same decision

time; for example, a daily step goal treatment might have two options, a ﬁxed 10,000-steps-a-day goal or an

adaptive goal based on the user’s activity level on the previous day. Considerations of treatment burden often

imply that the randomization will not be uniform. For example in HeartSteps,

[

At=

4 so that, if an

individual is always available, on average 2 lock-screen activity messages are delivered per day.

In designing, that is, determining the sample size for, a micro-randomized trial we focus on the reduced

longitudinal data

{S0,I1,A1,Y2,I2,A2,Y3, .. . , It,At,Yt+1, . .. , IT,AT,YT+1}.

The variable,

is an “availability”indicator. The availability indicator is coded as

It=

1 if the individual is

available for treatment and

It=

0 otherwise. At some decision times feasibility, ethics or burden considerations

mean that the individual is unavailable for treatment and thus

should not be delivered. Consider again

HeartSteps: if sensors indicate that the individual is likely driving a car or the individual is currently walking,

then the lock-screen activity message should not occur. Other examples of when individuals are unavailable for

treatment include: in the alcohol recovery setting, an “warning”treatment would only be potentially provided

when sensors indicate that the individual is within 10 feet of a high risk location or a treatment might only be

provided if the individual reports a high level of craving. If the application has a panic button, then only in an

second interval in which the panic button is pressed is it appropriate to provide “panic button”treatments.

Individuals may be unavailable for treatment by choice. For example, the HeartSteps application permits the

individual to turn off the lock-screen activity messages; this option is considered critical to maintaining partici-

pant buy-in and engagement with HeartSteps. After viewing the lock-screen activity message, the individual

has the option of turning off the lock-screen message for 4 or 8 or 12 hours. After the speciﬁed time interval,

the lock-screen message automatically turns on again. To summarize, the availability indicator at time

is the

indicator for the subpopulation at time

among which we are interested in assessing the proximal main effect of

the treatment; we are uninterested in assessing the proximal main effect of a treatment among individuals for

whom it is unethical to provide treatment or for whom it makes no scientiﬁc sense to provide treatment or among

those who refuse to be provided a treatment.

3 Proximal Main Effect of a Treatment

As discussed above, treatments in mobile health interventions are often designed so as to have a proximal

effect (e.g., increase activity in near future, help an individual manage current cravings for drugs or food, take

medications on schedule, etc.). As a result, a ﬁrst question in developing a mobile health intervention is whether

the treatments have a proximal effect. Here we develop sample size formulae that guarantee a stated power to

detect the proximal effect of a treatment. In particular we aim to test if the proximal main effect is zero.

To deﬁne the proximal main effect of a treatment, we use potential outcomes [

]. Our use of

potential outcome notation is slightly more complicated than usual because treatment can only be provided

when an individual is available. As a result, we index the potential outcomes by decision rules that incorporate

availability. In particular deﬁne

(

a,i

) for

a∈{

}, i∈{

}

(

“unavailable-do nothing”and

(

Then for each

a1∈A1={

}

, deﬁne

(

)

(

a1,I1

). Then we denote the potential proximal responses

following decision time 1 by

{YD1(1)

2,YD1(0)

and denote the potential availability indicators at decision time 2

{ID1(1)

2,ID1(0)

. Next for each

a2=

(

a1,a2

) with

a1,a2∈{

}

, deﬁne

(

)

(

a2,ID1(a1)

). Deﬁne

D2(¯

a2)=

(

)

,D2

(

)). A potential proximal response following decision time 2 and corresponding to

YD2(¯

a2)

and a potential availability indicator at decision time 3 is

ID2(¯

a2)

. Similarly, for each

at=

(

a1, . .. , at

)

∈At=

{

(

a1, . .. , at

)

¯¯ai∈{

},i=

, . .. , t}

, deﬁne

(

)

(

at,IDt−1(¯

at−1)

) and

Dt(¯

at)=

(

)

,. . . ,Dt

(

)). For each

at=

(

a1, . .. , at

)

∈At

, the potential proximal response is

YDt−1(¯

at−1)

(following decision time

t−

1) and potential

availability indicator is IDt−1(¯

at−1)

tat decision time t.

We deﬁne the proximal main effect of a treatment at time tamong available individuals by:

β(t)=EµYDt(¯

At−1,1)

t+1−YDt(¯

At−1,0)

t+1¯¯¯IDt−1(¯

At−1)

t=1¶

where the expectation is taken with respect to the distribution of the potential outcomes and randomization in

At−1

. This proximal effect is conditional in that the effect of treatment at time

is deﬁned for only individuals

available for treatment at time

, that is,

IDt−1(¯

At−1)

1. This proximal effect is a main effect in that the effect is

marginal over any effects of

At−1

. The former conditional aspect of the deﬁnition is related to the concept of

viable or feasible dynamic treatment regimes [

] in which one assesses only the causal effect of treatments

that can actually be provided.

Consider the proximal main effect,

(

), as

varies across time.

(

) may vary across time for a variety of

reasons. To see this consider the case of HeartSteps. Here

(

) might initially increase with increasing

participants learn and practice the activities suggested on the lock-screen. For larger

one might expect to see

decreasing or ﬂat

(

) due to habituation (participants begin to, at least partially, ignore the messages). This

time variation in

(

) can be attributed to both the immediate effect of a lock-screen activity message as well as

interactions between the past lock-screen activity messages and the present activity message; the time variation

occurs at least partially due to the marginal character of

(

). Alternately the conditional deﬁnition of

(

)

means that the effect is only deﬁned among the population of individuals who are available at decision time

Changes in this population may cause changes in

(

) across time. Again consider HeartSteps. At earlier time

points, participants are highly engaged, yet have not developed habits that in various ways increase their activity,

thus most participants will be available. However as time progresses, some participants may develop sufﬁciently

positive activity habits or anticipate activity suggestions, thus at later decision times these participants may

be already active and thus unavailable to receive a suggestion. Other participants may become increasing

disengaged and repeatedly turn off the lock-screen activity messages; these participants are also unavailable.

Thus as time progresses,

(

) may vary due to the subpopulation of participants among whom it is appropriate

to assess the effect of the lock-screen activity message.

Our main objective in determining the sample size will be to assure sufﬁcient power to detect alternatives to

the null hypothesis of no proximal main effect, H

(

)

,t=

,. . . T

for a trial with

decision points (if

(

) is

nonzero then for the population available at decision time

, there is a proximal effect). The proposed test will

be focused on detecting smooth, i.e., continuous in t, alternatives to this null hypothesis.

To express

(

) in terms of the observed data distribution, we assume consistency [

]. This assumption

is that for each

, the observed

and observed

equal the corresponding potential outcomes,

YDt−1(¯

at−1)

IDt−1(¯

at−1)

whenever

At−1=¯

at−1

. This assumption may be violated if some of the treatments promote social

linkages between participants, for example, to enhance social/emotional support or to compete in mobile

games. In these cases it would be more appropriate to additionally index each individual’s potential outcomes

by other participants’ treatments. The micro-randomization plus the consistency assumption implies that the

proximal main effect of treatment at time tamong available individuals, β(t) can be written as,

β(t)=E£YDt(¯

At−1,1)

t+1¯¯IDt−1(¯

At−1)

t=1¤−E£YDt(¯

At−1,0)

t+1¯¯IDt−1(¯

At−1)

t=1¤

=E£YDt(¯

At−1,1)

t+1¯¯IDt−1(¯

At−1)

t=1, At=1¤−E£YDt(¯

At−1,0)

t+1¯¯IDt−1(¯

At−1)

t=1, At=0¤

=E£YDt(¯

At)

t+1¯¯IDt−1(¯

At−1)

t=1, At=1¤−E£YDt(¯

At)

t+1¯¯IDt−1(¯

At−1)

t=1, At=0¤

=E[Yt+1|It=1, At=1] −E[Yt+1|It=1, At=0]

where the second equality follows from the randomization of the

’s and the last equality follows from the

consistency assumption.

4 Test Statistic

Our sample size formula is based on a test statistic for use in testing H

(

)

,t=

,. . . T

against a scientiﬁcally

plausible alternative. This alternative should be formed based on conversations with domain experts. Here we

construct a test statistic to detect alternatives that are, at least approximately, linear in a vector parameter,

, that

is, alternatives of the form

tβ

, where the

p×

1 vector,

, is a function of

and covariates that are unaffected by

treatment such as time of day or day of week. In the case of HeartSteps, a plausible alternative is quadratic:

tβ=¡1,bt−1

5c,(bt−1

5c)2¢β(1)

where

β=

(

β1,β2,β3

)

(

3). Recall that in HeartSteps there are 5 decision times per day;

bt−1

translates

decision times

to days. This rather simplistic parametrization marginalizes across the day and treats the

weekends and weekdays similarly.

We propose to use the alternate, H

(

)

=Z0

tβ

,. . . ,T

to construct the test statistic. We base the test

statistic on the estimator of

in a least squares ﬁt of a working model. A simple working model based on the

alternative is:

E[Yt+1|It=1, At]=B0

tα+(At−ρt)Z0

tβ(2)

over all

t∈{

,. . . ,T}

, where

ρt

is the known randomization probability (

[

At=

=ρt

) and the

q×

1 vector

a function of

and covariates that are unaffected by treatment such as time of day or day of week. Note that

is centered by subtracting off the randomization probability; thus the working model for

(

)

[

Yt+1|It=

1] is

tα. The estimators ˆ

α,ˆ

βminimize the least squares error:

PN(T

t=1

It¡Yt+1−B0

tα−(At−ρt)Z0

tβ¢2)(3)

where PN©f(X)ªis deﬁned as the average of f(X) over the sample.

Note that from a technical perspective, minimizing the least squares criterion, (3), is reminiscent of a

GEE analysis [

] with identity link function and a working correlation matrix equal to the identity. Thus it is

natural to consider a non-identity working correlation matrix as is common in GEE. This, however, is problem-

atic from a causal inference perspective. To see this suppose that the true conditional expectation is in fact

E(Yt+1|It=1, At]=B0

tα∗+

(

At−ρt

)

tβ∗

, that is, the causal parameter,

(

) is equal to

tβ∗

. Further suppose

that the working correlation matrix has off-diagonal elements and that we estimate

β∗

by minimizing the

weighted (by the inverse of the working correlation matrix) least squares criterion. In this case the resulting

estimating equations include sums of terms such as

It¡Yt+1−B0

tα−(At−ρt)Z0

tβ¢Is

(

As−ρt

)

for

t>s

. Unfor-

tunately, both availability at time

, as well as

Yt+1

may be affected by treatment in the past (in particular,

thus absent strong assumptions

E£It¡Yt+1−B0

tα∗−(At−ρt)Z0

tβ∗¢Is(As−ρt)¤

is unlikely to be 0. Recall that a

minimal condition for consistency of estimators of (

α∗,β∗

) is that the estimating equations have expectation

0, thus absent further assumptions, the estimators derived from the weighted least squares criterion are likely

biased. Another possibility is to include a time-varying variance term in the least squares criterion, that is the

th entry in (3) might be weighted by a

σ−2

. This would be useful in the data analysis, however for sample size

calculations, values of these variances are unlikely to be available. Thus for simplicity we use the unweighted

least squares criterion in (3).

Assume that the matrices

Q=PT

t=1E

[

]

ρt

−ρt

)

ZtZ0

and

t=1E

[

]

BtB0

are invertible. The least squares

estimators, ˆ

α,ˆ

βare consistent estimators of

α=ÃT

t=1

E[It]BtB0

t!−1T

t=1

E[It]α(t)Bt(4)

and

β=ÃT

t=1

E[It]ρt(1 −ρt)ZtZ0

t!−1T

t=1

E[It]ρt(1 −ρt)β(t)Zt(5)

respectively. Furthermore if

(

) is in fact equal to

tβ

for some

, then

t˜

β=β

(

). This is the case even if

[

Yt+1|It=

6= B0

t˜

. In the appendix (Lemma 1), we prove these results and also show that, under moment

conditions, pN(ˆ

β−˜

β) is asymptotically normal with mean 0 and variance Σβ=Q−1W Q −1where,

W=E"³T

t=1

²tIt(At−ρt)Zt´×³T

t=1

²tIt(At−ρt)Z0

t´#

and

²t=Yt+1−ItB0

t˜

α−

(

At−ρt

)

ItZ0

t˜

. To test the null hypothesis H

(

)

,t=

,. . . ,T

, one can use a test

statistic based on the alternative, e.g.

Nˆ

β0ˆ

Σ−1

βˆ

β(6)

where

Σβ=ˆ

Q−1ˆ

Wˆ

Q−1

and

are plug in estimators. Note that this test statistic results from a GEE analysis

with identity link function and a working correlation matrix equal to the identity matrix for which sample size

formulae have been developed [

]. We build on this work as follows. As Tu et.al [

] discuss, under the null

hypothesis the large sample distribution of this statistic is a chi-squared with

degrees of freedom distribution.

If N, the sample size, is small, then, as recommended in [17], we make small adjustments to improve the small

sample approximation to the distribution of the test statistic. In particular Mancl and DeRouen recommend

adjusting

using the “hat” matrix; see the formulae for the adjusted

as well as

in Appendix A. Also in

small sample settings, investigators commonly suggest that instead of using a critical value based on the chi-

squared distribution, a critical value based on the

t−

distribution should be used [

]. As we are considering a

simultaneous test for multiple parameters we form the critical value based on Hotelling’s

T−

squared distribution

[

]. Hotelling’s

T−

squared distribution is a multiple of the

distribution given by

d1(d1+d2−1)

d2Fd1,d2

; here we

use

d1=p

and

d2=N−q−p

(recall

is the number of parameters in the nuisance parameter vector,

); see the

appendix for a rationale. In the following, the rejection region for the test of H

(

)

,t=

,. . . T

based on (6)

½Nˆ

β0ˆ

Σ−1

βˆ

β>F−1

p,N−q−pµ(N−q−p)(1 −α0)

p(N−q−1) ¶¾

where α0is the desired signiﬁcance level.

5 Sample Size Formulae

As Tu et.al [

] have developed general sample size formulas in the GEE setting, here we focus on considerations

speciﬁc to the setting of micro-randomized trials. To size the study, we will determine the sample size needed to

detect the alternate, β(t) with:

H1:β(t)/ ¯

σ=d(t), t=1,. . . ,T

where

σ2=

(1/

)

t=1E£Var¡Yt+1¯¯It=1, At¢¤

is the average variance and

(

) is a standardized treatment effect.

When

is large and H

holds,

Nˆ

β0ˆ

Σ−1

βˆ

is approximately distributed as a noncentral chi-squared

χ2

(

), where

, the non-centrality parameter, satisﬁes

cN=N

(

σ˜

)

0Σ−1

(

σ˜

), and

d=¡PT

t=1E[It]ρt(1 −ρt)ZtZ0

t¢−1PT

t=1E

[

]

ρt

−

ρt)d(t)Zt[27]. Note that ˜

d=˜

β/¯

σ.

Working Assumptions

. To derive the sample size formula, we use the form of the non-centrality parameter

of the limiting non-central chi-squared distribution, along with working assumptions. The working assumptions

are used to simplify the form of Σ−1

β. In particular, we make the following working assumptions:

(a) E(Yt+1|It=1) =B0

tα, for some α∈Rq

(b) β(t)=Z0

tβfor some β∈Rp

(d) E[˜

²t˜

²s|It=1, Is=1, At,As] is constant in At,As.

where, as before,

²t=Yt+1−ItB0

t˜

α−

(

At−ρt

)

ItZ0

t˜

. See the proof in appendix A (Lemma 2). The above working

assumptions are somewhat simplistic but as will be seen below the resulting sample size formula is robust to

moderate violations. First, under these working assumptions the alternative hypothesis can be re-written as

H1:β/¯

σ=d, (7)

where dis a pdimensional vector of standardized effects. Furthermore, Σβis given by

Σβ=¯

σ2³T

t=1

E[It]ρt(1 −ρt)ZtZ0

t´−1

and thus cNis given by

cN=Nd 0³T

t=1

E[It]ρt(1 −ρt)ZtZ0

t´d. (8)

To improve the small sample approximation, we use the multiple of the

-distribution as discussed above. Thus

the sample size, N, is found by solving

p(N−q−1)

N−q−pFp,N−q−p;cNµF−1

p,N−q−pµ(N−q−p)(1 −α0)

p(N−q−1) ¶¶=1−β0(9)

where

Fp,N−q−p;cN

is the noncentral

distribution with noncentrality parameter,

and 1

−β0

is the desired

power. The inputs to this sample size formula are

{Zt}T

t=1

, a scientiﬁcally meaningful value for

(see below for

an illustration), the time-varying availability pattern,

[

]

t=1

, the desired signiﬁcance level,

α0

and power,

1−β0.

Now we describe how the information needed in the sample size formula might be obtained when the

alternative is quadratic (

3, (1)). In this case we ﬁrst elicit the initial standardized proximal main effect given by

1β

σ=β1

. Second we elicit the averaged across time, standardized proximal main effect

d=1

TPT

t=1Z0

tβ

Lastly we elicit the time at which the proximal main effect is maximal, i.e. argmaxtZ0

tβ. These three quantities

can then be used to solve for

(

d1,d2,d3

)

. For example, in HeartSteps, we might want to determine the

sample size to ensure 80% power when there is no initial treatment effect on the ﬁrst day, and the maximum

proximal main effect comes around day 29. We specify the expected availability,

[

] to be constant in

and

is given by (1). Table I gives sample sizes for HeartSteps under a variety of average standardized proximal main

effects ( ¯

d).

Table I: Illustrative sample sizes for Heart-

Steps. The day of maximal treatment effect

is 29. The expected availability is constant

in t.

E[It]0.7 0.6 0.5 0.4

0.10 32 36 42 52

0.09 38 44 51 63

0.08 47 54 64 78

0.07 60 69 81 101

0.06 79 92 109 135

0.05 112 130 155 193

(1/

)

t=1Z0

is the average stan-

dardized treatment effect.

In the behavioral sciences a standardized effect size of 0

2 is considered small [

]. Thus given the very small

standardized effect sizes, the sample sizes given in Table I seem unbelievably small. Two points are worth

making in this regard. First the use of the alternative parametric hypothesis (7) in forming the test statistic,

implies that both between-subject as well as within-subject contrasts in proximal responses are used to detect

the alternative. To see this, note that if we focused on only the ﬁrst time point,

1, and tested

(1)

0, then

an appropriate test would be a two-sample

-test based on the proximal response

, in which case the required

sample size would be much larger (akin to the sample size for a two arm randomized-controlled trial in which

40% of the subjects are randomized to the treatment arm). This two-sample

-test uses only between-subject

contrasts in proximal response to test the hypothesis. The required sample size would be even larger for a test of

(1)

,β

(2)

0 in which no relationship between

(1) and

(2) is assumed. Conversely the sample size

would be smaller if one focused on detecting alternatives to

(1)

,β

(2)

0 of the form

(1)

=β

(2)

The use of the alternative,

(1)

=β

(2)

0, allows one to construct tests that use both between-subject as well

as within-subject contrasts in proximal responses. Our approach is in between these two extremes in that we

focus on detecting smooth, in

, alternatives to

(

)

0 for all

. This permits use of both within- as well as

between-subject contrasts in proximal responses. The assumption of a parsimonious alternative enables the use

of smaller sample sizes. A second point is that, at this time, there is no general understanding of how large the

standardized effect size should be for these "in-the-moment" effects of a treatment. Thus these standardized

effects may or may not be considered small in future.

6 Simulations

We consider a variety of simulations with different generative models to evaluate the performance of the sample

size formulae. In the simulations presented here, we use the same setup as in HeartSteps; see Appendix B for

simulations in other setups (Table 4B). Speciﬁcally, the duration of the study is 42 days and there are 5 decision

times within each day (

210). The randomization probability is 0.4 , e.g.

ρ=ρt=P

(

At=

4. The sample

size formula is given in (8) and (9). All simulations are based on 1,000 simulated data sets.

Throughout this section the inputs to this sample size formula are

Zt=¡1,bt−1

5c,bt−1

5c2¢0

, the time-varying

availability pattern,

τt=E

[

α0=.

05 and power, 1

−β0=.

80. The value for the vector

is indirectly speciﬁed

via (a) the time at which the maximal standardized proximal main effect is achieved (

argmaxtZ0

), (b) the

averaged across time, standardized proximal main effect

d=1

TPT

t=1Z0

and (c) no initial standardized proximal

main effect (

1d=d1=

0). The test statistic used to evaluate the sample size formula is given by (6) in which

and Ztare set to ¡1,bt−1

5c,bt−1

5c2¢0.

The simulation results provided below illustrate that the sample size formula and associated test statistic are

robust. For convenience we summarize the results here. When the working assumptions hold, then under a

variety of availability patterns, i.e., time-varying values for

τt=E

[

] (see Figure 1) the desired Type 1 error and

power are preserved. This is also the case when past treatment impacts availability. Furthermore the sample

size formula is robust to deviations from the working assumptions, that is, provides the desired Type 1 error

and power; this is true for a variety of forms of the true proximal main effect of the treatment (see Figure 2), a

variety of distributions and correlation patterns for the errors, and dependence of

Yt+1

on past treatment. In all

cases the above robustness occurs as long as we provide an approximately true or conservative value for the

standardized effect,

and if we provide an approximately true or conservative (low) value for the availability,

E[It].

In our simulations, we note several areas in which the sample size formula is less robust to the working

assumption (c); this is when the error variance in

Yt+1

varies depending on whether treatment

At=

1 or

At=

or with time

. In particular if the ratio of

Var

[

Yt+1|It=

,At=

1]/

Var

[

Yt+1|It=

,At=

1, then the power is

reduced. Also if average variance,

E£Var

[

Yt+1|It=

,At

]

varies greatly with time

, then the power is reduced.

See below for details. Lastly as would be expected for any sample size formula, using values of the standardized

effect size, d, or availability that are larger than the truth degrades the power of the procedure.

6.1 Working Assumptions Underlying Sample Size Formula are True

First, we considered a variety of settings in which the working assumptions (a)-(d) hold and in which the inputs to

the sample size formula are correct (

is correct under the alternate hypothesis and the time-varying availability

[

] is correct). Neither the working assumptions nor the inputs to the sample size formula specify the error

distribution, thus in the simulation we consider 5 distributions for the errors in the model for

Yt+1

including

independent normal, student’s

and exponential distributions as well as two autoregressive (AR) processes;

all of these error patterns satisfy

σ2=

1 (recall

σ2=

(1/

)

t=1E£Var¡Yt+1¯¯It=1, At¢¤

). Furthermore neither

the working assumptions nor the inputs to the sample size formula specify the dependence of the availability

indicator,

on past treatment. Thus we consider settings in which the availability decreases as the number of

recent treatments increases. For brevity, we provide these standard results in the Appendix B (Tables 2B and 3B).

The results are generally quite good, with very few Type 1 error rates signiﬁcantly above .05 and power levels

signiﬁcantly below .80.

Pattern 1 Pattern 2 Pattern 3 Pattern 4

0.40

0.45

0.50

0.55

0.60

0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Time

Availability

Figure 1: Availability Patterns. The x-axis is decision time point and y-axis is the expected availability. Pattern 2

represents availability varying by day of the week with higher availability on the weekends and lower mid-week.

The average availability is 0.5 in all cases.

6.2 Working Assumptions Underlying Sample Size Formula are False

Second, we considered a variety of settings in which the working assumptions are false but the inputs to the

sample size formula are approximately correct as follows. Throughout ¯

σ2=1.

6.2.1 Working Assumption (a) is Violated.

Suppose that the true

[

Yt+1|It=

6=Btα

for any

α∈Rq

. In particular, we consider the scenario in which there

is a "weekend" effect on Yt+1; see other scenario in Appendix B. The data is generated as follows,

ItBer

∼¡τt¢,AtBer

∼¡ρ¢

Yt+1=α(t)+(At−ρ)Z0

td+²t, if It=1

where the conditional mean

(

)

=B0

tα+Wtθ

is a binary variable:

Wt=

1 if day of the week is time

is a

weekend day, and

Wt=

0 if the day is a weekday. For simplicity, we assume each subject starts on Monday, e.g.

for

,. . . ,

Wi+35(k−1) =

0, when

,. . . ,

25,

Wi+35(k−1) =

1, when

,. . . ,

35 (recall that we assume in the

simulation that there are 5 decision time points per day and the length of the study is 6 week). The values of

{αi,i=

}

are determined by setting

(1)

,arg maxtα

(

)

=T,

(1/

)

t=1α

(

)

−α

(1)

1. The error terms

{²t}N

t=1

are i.i.d N(0

1). The day of maximal proximal effect is 29. Additionally, different values of the averaged

standardized treatment effect and four patterns of availability as shown in Figure 1 with average 0.5 and are

considered. The type I error rate is not affected, thus is omitted here. The simulated power is reported in Table

II; for more details see Table 6B in Appendix B.

Table II: Simulated power when working assump-

tion (a) is violated. The patterns of availability are

provided in Figure 1.

Availability Pattern

θ¯

dPattern 1 Pattern 2 Pattern 3

0.5 ¯

d0.10 0.80 0.79 0.81

0.06 0.78 0.83 0.81

1¯

d0.10 0.79 0.78 0.78

0.06 0.78 0.79 0.79

1.5 ¯

d0.10 0.78 0.81 0.78

0.06 0.77 0.81 0.82

2¯

d0.10 0.78 0.79 0.79

0.06 0.81 0.79 0.78

is the coefﬁcient of

[

Yt+1|It=

1].

(1/

)

t=1Z0

is the average standardized treat-

ment effect. Bold Numbers are signiﬁcantly (at .05

level) greater than .05.

6.2.2 Working Assumption (b) is Violated.

Suppose that the true

(

)

6= Z0

tβ

for any

. Instead the vector of standardized effect,

, used in the sample

size formula corresponds to the projection of

(

), that is,

d=¡PT

t=1E[It]ZtZ0

t¢−1PT

t=1E

[

]

Ztd

(

) (recall

(

)

(

and

ρt=ρ

). The sample size formula is used with the correct availability pattern,

[

]

t=1

. The data for

each simulated subject is generated sequentially as follows. For each time t,

ItBer

∼¡τt¢,AtBer

∼¡ρ¢

Yt+1=α(t)+(At−ρ)d(t)+²t, if It=1

for the variety of

(

)

=β

(

and

[

] patterns provided in Figure 2 and in Figure 1 respectively. The average

availability is 0.5. The error terms

{²t}T

t=1

are generated as i.i.d.

1). The conditional mean,

[

Yt+1|It=

=α

(

) is given by

(

)

=α1+α2bt−1

5c+α3bt−1

5c2

, where

α1=

α2=

727,

α3= −

−4

(so that

(1/T)Ptα(t)−α(1) =1, argmaxtα(t)=T).

Table III: Simulated Power when working assumption (b) is violated. The shape

of the standardized proximal effect and pattern for availability are provided in

Figure 2 and 1 respectively. The sample sizes are given on the right.

Shape of d(t)

dAvailability Pattern Max Maintained Degraded Sample Size

0.10

Pattern 1 15 0.78 0.79 43 39

29 0.80 0.79 38 38

Pattern 2 15 0.79 0.80 43 39

29 0.78 0.79 38 38

Pattern 3 15 0.81 0.77 45 41

29 0.81 0.78 37 39

0.06

Pattern 1 15 0.81 0.79 111 100

29 0.81 0.79 96 96

Pattern 2 15 0.79 0.81 112 100

29 0.79 0.80 96 96

Pattern 3 15 0.78 0.81 116 106

29 0.80 0.80 95 101

(1/

)

t=1Z0

is the average standardized treatment effect. The "Max" in

the ﬁrst row refers to the day of maximal proximal effect. Bold Numbers are

signiﬁcantly (at .05 level) lower than .80.

Max = 15

Maintained

Max = 15

Severely Degraded

Max = 29

Maintained

Max = 29

Severely Degraded

0.00

0.05

0.10

0.15

0.20

0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Time

Proximal Effect

Figure 2: Proximal Main Effects of Treatment,

(

)

t=1

: representing maintained and severely degraded time-

varying proximal treatment effects. The horizontal axis is the decision time point. The vertical axis is the

standardized treatment effect. The "Max" in the titles refer to the day of maximal proximal effect. The average

standardized proximal effect is ¯

d=0.1 in all plots.

The simulated powers are provided in Table III. In all cases the power is close to

80; this is because all of

the proximal main effect patterns in Figure 2 are sufﬁciently well approximated by a quadratic in time. See

Appendix B for other cases of d(t) and details (Figure 5 and Table 9B).

6.2.3 Working Assumption (c) is Violated.

Suppose that

Var

[

Yt+1|It=

,At

]

=Atσ2

1t+

−At

)

σ2

where

σ1t

σ0t6=

1. The sample size formula is used with

the correct pattern for

{Z0

td,E

[

]

t=1

. The data for each simulated subject is generated sequentially as follows.

For each time t,

ItBer

∼¡τt¢,AtBer

∼¡ρ¢

Yt+1=α(t)+(At−ρ)Z0

td+1{At=1}σ1t²t+1{At=0} σ0t²t, if It=1

where the average across time standardized proximal main effect,

d=1

TPT

t=1Z0

is 0

1 and day of maximal

effect is equal to 22 or 29. The function

(

)

[

Yt+1|It=

1] is as in the prior simulation. The availability,

τt=

The error terms

{²t}

follow a normal AR(1) process, e.g.

²t=φ²t−1+vt

with the variance of

scaled so that

Var

[

²t

]

1. Deﬁne

σ2

t=E£Var

[

Yt+1|It=

,At

]

¤¡=ρσ2

1t+(1 −ρ)σ2

0t¢

. Recall the average variance

σ2

is given by

(1/

)

t=1¯

σ2

. We consider 3 time-varying trends for

{¯

σt}

together with different values of

σ1t

σ0t

; see Figure

(3). In each trend,

σ2

is scaled such that

σ=

1; thus the standardized proximal main effect in the generative

model is

. In all cases, the simulated type I error rates are close to

05 and thus the table is omitted here (see

Appendix B, Table 10B). The simulated power is given in Table IV.

Table IV: Simulated Power when working assumption (c) is violated,

σ1t6=

σ0t

. The trends are provided in Figure 3. The availability is 0.5. The average

proximal main effect,

1 and the day of maximal effect is 22 or 29, and

thus the associated sample sizes are 41 and 42.

Max = 22 (N = 41) Max = 29 (N = 42)

φσ1t

σ0ttrend 1 trend 2 trend 3 trend 1 trend 2 trend 3

0.8 0.83 0.84 0.80 0.81 0.89 0.79

-0.6 1.0 0.79 0.80 0.75 0.74 0.85 0.70

1.2 0.76 0.76 0.71 0.72 0.81 0.70

0.8 0.85 0.82 0.79 0.81 0.88 0.78

0 1.0 0.79 0.81 0.74 0.77 0.86 0.72

1.2 0.77 0.77 0.71 0.70 0.83 0.70

0.8 0.83 0.83 0.81 0.77 0.87 0.77

0.6 1.0 0.76 0.79 0.75 0.73 0.85 0.77

1.2 0.78 0.77 0.73 0.72 0.82 0.69

is the parameter in AR(1) for

{²t}T

t=1

. “Max”is the day in which the maxi-

mal proximal effect is attained. Bold numbers are signiﬁcantly (at .05 level)

lower than .80.

Trend 1 Trend 2 Trend 3

0.8

0.9

1.0

1.1

1.2

0.8

0.9

1.0

1.1

1.2

0.8

1.0

1.2

1.4

0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Time

Sigma

Figure 3: Trend of

σt

: For all trends,

σ2

is scaled so that (1/

)

t=1¯

σ2

1. In Trend 3, the variance,

σ2

E£V ar [Yt+1|It=1, At]¤peaks on weekends. In particular, ¯

σ7k+i=0.8 for i=1,. . ., 5 and ¯

σ7k+i=1.5 for i=6,7.

In the case of

σ1t<σ0t

, the simulated powers are slightly larger than 0.8, while the simulated powers are

smaller than 0.8 in the case of

σ1t>σ0t

. The impact of

σt

on the power depends on the shape of treatment

effect: when

(

) attains its maximum, more than halfway through the study, at day 29, a increasing

{¯

σt}

, trend

1, lowers the power, while a decreasing

{¯

σt}

, trend 2, improves the power. When

(

) attains a maximal effect

midway through the study, either decreasing or increasing

{¯

σt}

does not impact power. A large variation in

σt

e.g. trend 3, reduces the power in all cases. The differing auto correlations of the errors,

²t

, do not affect power;

see a more detailed table in Appendix B, Table 10B.

6.2.4 Working Assumption (d) is Violated

We violate assumption (d) by making both the availability indicator,

and proximal response,

Yt+1

depend

on past treatment and past proximal responses. The sample size formula is used with the correct value of

{Z0

td,E

[

]

t=1

; in particular

is determined by an average proximal main effect of

1, day of maximal effect

equal to 29 (

d1=

,d2=

−3,d3=−

−4

) and with a constant availability pattern equal to 0.5. The

data for each simulated subject is generated as follows. Denote the cumulative treatment over last 24 hours by

Ct=P5

j=1At−jIt−j. In each time t,

ItBer

∼¡τt+τtη1(Ct−E[Ct])+τtη2Trunc( 1

j=1

²t−j)¢,AtBer

∼¡ρ¢

Yt+1=(α(t)+γ1[Ct−E[Ct|It=1]]+(At−ρ)£Z0

td+Z0

tdγ2(Ct−E[Ct|It=1])¤+σ∗²tif It= 1

α0(t)+²tif It= 0.

where

{²t}T

t=1

are i.i.d

1) and

Trunc

(

) :

=x1|x|≤1+sign

(

)

I|x|>1

(the truncation is used to ensure that

τt+

τtη1

(

Ct−E

[

])

+τtη2Trunc

(

5P5

j=1²t−j

)

∈

1]). Again

(

) is as in the prior simulation.

σ∗

is calculated such

that the average variance is equal to 1, e.g.

σ=1

TPT

t=1E

[

Var

[

Yt+1|It=

,At

]]

1. Note that since

is centered

in both the model for

as well as in the model for

Yt+1

, the standardized proximal main effect is

and

[

]

=τt=

α0

(

) is the conditional mean of

Yt+1

when

It=

0. The form of

[

Yt+1|It=

0] is not essential:

only

Ys+1−E

[

Ys+1|Is=

0] is used to generate

. In the simulation,

[

Ct|It=

1] and

σ∗

are calculated by Monte

Carlo methods. As before, the simulated type I error are not affected; see Table 11B in appendix B. The simulated

powers are provided in Table V.

Table V: Simulated Power when working assumption

(d) is false. The expected availability is 0.5, the average

proximal main effect

1 and the maximal effect is

attained at day 29. The associated sample size is 42.

Parameters in Itγ1

γ2-0.1 -0.2 -0.3

-0.2 0.80 0.81 0.79

η1=−0.1,η2= −0.1 -0.5 0.79 0.81 0.80

-0.8 0.81 0.82 0.79

-0.2 0.78 0.82 0.79

η1=−0.2,η2= −0.1 -0.5 0.81 0.77 0.77

-0.8 0.81 0.79 0.78

-0.2 0.78 0.78 0.80

η1=−0.1,η2= −0.2 -0.5 0.80 0.79 0.78

-0.8 0.78 0.79 0.80

γ1

γ2

are parameters for the cumulative treatments in

model of

Yt+1

;

η1

η2

are parameters in model of

. Bold

numbers are signiﬁcantly(at .05 level)less than .80.

6.3 Some Practical Guidelines

Third, it is critical to use conservative values of

and availability

[

] in the sample size formula. It is not

surprising that the quality of the sample size formula depends on an accurate or conservative values of the

standardized effects,

, as this is the case for all sample size formulas. Additionally availability provides the

number of decision points as which treatment might be provided per individual and thus the sample size

formula should be sensitive to availability. To illustrate these points we consider a simulation in which the data

is generated by

ItBer

∼¡τt¢,AtBer

∼¡ρ¢

Yt+1=α(t)+(At−ρ)Z0

td+²t, if It=1

where the

²t

’s are i.i.d. standard normals and

(

) is as in the prior simulations. First suppose the scientist

provides the correct availability pattern,

[

]

t=1

, the correct time at which the maximal standardized proximal

main effect is achieved (

argmaxtZ0

) and the correct initial standardized proximal main effect (

1d=d1=

but provides too low a value of the averaged across time, standardized proximal main effect

d=1

TPT

t=1Z0

. The

simulated power is provided in Appendix B, Table 12B. The degradation in power is pronounced as might be

expected.

Second, suppose the scientist provides the correct

argmaxtZ0

, correct

1d=d1=

0, correct

d=1

TPT

t=1Z0

and although the scientist’s time-varying pattern of availability is correct, the magnitude is underestimated. The

simulation result is in Appendix B, Table 13B. Again the degradation in power is pronounced.

7 Discussion

In this paper, we have introduced the use of micro-randomized trials in mobile health and have provided an

approach to determining the sample size. More sophisticated sample size procedures might be entertained.

Certainly it makes sense to include baseline information in the sample size procedure, for example in HeartSteps,

a natural baseline variable is baseline step count. The inclusion of baseline variables in

in the regression

(2)

straightforward. An interesting generalization to the sample size procedure would allow scientists to include

time-varying variables (in

) as covariates in

in the regression

(2)

. This might be a useful strategy for reducing

the error variance.

Although this paper has focused on determining the sample size to detect the proximal main effect of a

treatment with a given power, micro-randomized studies provide data for a variety of interesting further analyses.

For example, it is of some interest to model and understand the predictors of the time-varying availability

indicator. In the case of HeartSteps we will know why the participant is unavailable (driving a car, already active

or has turned off the lock-screen messages) so we will be able to consider each type of availability indicator.

Other very interesting further analyses include assessing interactions between treatments,

and context,

past treatment

As,s<t

on the proximal response,

Yt+1

. Also there is much interest in using this type of data to

construct “dynamic treatment regimes”; in this setting these are called Just-in-Time Adaptive Interventions [

The sequential micro-randomizations enhance all of these analyses by reducing causal confounding.

Appendix A Theoretical Results and Proofs

Lemma 1

(Least Squares Estimator)

The least square estimators

α,ˆ

are consistent estimators of

α,˜

(4)

and

(5)

. In particular, if

(

)

=Z0

tβ∗

for some vector

β∗

, then

β=β∗

. Under moment conditions, we have

(

β−˜

)

→

,Σβ

), where the asymptotic variance

Σβ

is given by

Σβ=Q−1W Q−1

where

Q=PT

t=1E

[

]

ρt

−ρt

)

ZtZ0

W=EhPT

t=1˜

²tIt(At−ρt)Zt×PT

t=1˜

²tIt(At−ρt)Z0

tiand ˜

²t=Yt+1−B0

t˜

α−Z0

t˜

β(At−ρt).

Proof. It’s easy to see that the least square estimators satisfy

θ=(ˆ

α,ˆ

β)=³PN

t=1

ItXtX0

t´−1³PN

t=1

ItYt+1Xt´

→³T

t=1

E(ItXtX0

t)´−1³T

t=1

E(ItYt+1Xt)´

where X0

t=(B0

t,(At−ρt)Z0

t)∈R1×(p+q)is the covariate at time t. For each t,

E(ItXtX0

t)=µE[It]BtB0

tBtZ0

tE[It(At−ρt)]

ZtB0

tE[It(At−ρt)] ZtZ0

tE[It(At−ρt)2]¶=µE[It]BtB0

0E[It]ρt(1 −ρt)ZtZ0

t¶

E(ItYt+1Xt)=µE[ItYt+1]Bt

E[ItYt+1(At−ρt)]Zt¶=µE[ItYt+1]Bt

ρt(1 −ρt)E[It]β(t)Zt¶,

so that

α→ÃT

t=1

E[It]BtB0

t!−1T

t=1

E[ItYt+1]Bt=ÃT

t=1

E[It]BtB0

t!−1T

t=1

E[It]α(t)Bt

β→ÃT

t=1

ρt(1 −ρt)E[It]ZtZ0

t!−1T

t=1

E[ItYt+1(At−ρt)]Zt=ÃT

t=1

ρt(1 −ρt)E[It]ZtZ0

t!−1T

t=1

E[It]ρt(1 −ρt)β(t)Zt

as in

(4)

and

(5)

. We can see that if

(

)

=Z0

tβ∗

, then

¡PT

t=1ρt(1 −ρt)E[It]ZtZ0

t¢−1PT

t=1E

[

]

ρt

−ρt

)

(

)

Zt=

¡PT

t=1ρt(1 −ρt)E[It]ZtZ0

t¢−1PT

t=1E[It]ρt(1 −ρt)ZtZ0

tβ∗=β∗. This is true even if E[Yt+1|It=1] 6= B0

t˜

α.

We can easily see that,

pN(ˆ

θ−˜

θ)=pN½¡PN

t=1

ItXtX0

t¢−1h¡PN

t=1

ItYt+1Xt¢−¡PN

t=1

ItXtX0

t¢˜

θi¾

=pNnE£

t=1

ItXtX0

t¤−1¡PN

t=1

It˜

²tXt¢o+op(1), (10)

where

(

) is a term that converges in probability to zero as

goes to inﬁnity. By the deﬁnition of

and

, we

have

E£

t=1

It˜

²tXt¤=µPT

t=1E[It]¡α(t)−B0

t˜

α¢Bt

t=1E[It]ρt(1 −ρt)¡β(t)−Z0

t˜

β¢Zt¶=0

So that under moments conditions, we have pN(ˆ

θ−˜

θ)→N(0,Σθ), where Σθis given by

Σθ=E£

t=1

ItXtX0

t¤−1E£

t=1

It˜

²tXt×

t=1

It˜

²tX0

t¤E£

t=1

ItXtX0

t¤−1=·ΣαΣαβ

Σ0

αβ Σβ¸.

In particular, ˆ

βsatisﬁes pN(ˆ

β−˜

β)→N(0,Σβ) and Σβis given by

Σβ=³T

t=1

E[It]ρt(1 −ρt)ZtZ0

t´−1

EhT

t=1

²tIt(At−ρt)Zt×

t=1

²tIt(At−ρt)Z0

ti³ T

t=1

E[It]ρt(1 −ρt)ZtZ0

t´−1

=Q−1W Q−1.

Lemma 2

(Asymptotic Variance Under Working Assumptions)

Assuming working assumptions (a)-(d) are true,

then under the alternative hypothesis H1in (7), Σβand cNare given by

Σβ=¯

σ2³T

t=1

E[It]ρt(1 −ρt)ZtZ0

t´−1

cN=Nd 0³T

t=1

E[It]ρt(1 −ρt)ZtZ0

t´d.

Proof.

Note that under assumptions (b) and (c), we have

t˜

β=β

(

) and

Var

(

Yt+1|It=

,At

)

=¯

for each t, and

d=d

. The middle term,

, in

Σβ

can be separated by two terms, e.g.

EhPT

t=1˜

²tIt

(

At−ρt

)

Zt×PT

t=1˜

²tIt

(

At−

ρt

)

ti=PT

t=1E£˜

²2

tIt

(

At−ρt

)

2¤ZtZ0

t+PT

i6=jE£˜

²i˜

²jIiIj

(

Ai−ρi

)(

Aj−ρj

)

¤ZiZ0

. Under assumptions (a), (b) and

(c), we have

[

²t|It=

,At

]

0 and

E£˜

²2

tIt

(

At−ρt

)

2¤=E

[

]

ρt

−ρt

)

σ2

. Furthermore, suppose

i>j

, then

E£˜

²i˜

²jIiIj

(

Ai−ρ

)(

Aj−ρ

)

¤=E

[

IiIj

(

Aj−ρ

)(

Ai−ρ

)]

×E

[

²t˜

²s|It=

,Is=

,At,As

]

0, because

{Ii,Ij,Aj}

and

the ﬁrst term is 0. Wis then given by

W=¯

σ2T

t=1

E[It]ρt(1 −ρt)ZtZ0

so that Σβ=¯

σ2¡PT

t=1E[It]ρt(1 −ρt)ZtZ0

t¢−1and cN=N(¯

σ˜

d)0Σ−1

β(¯

σ˜

d)=Nd 0³PT

t=1E[It]ρt(1 −ρt)ZtZ0

t´d.

Remark: Working assumption (d) can be replaced by assuming

[

Yt+1|It=

,At,Is=

,As

]

−E

[

Yt+1|It=

,At

]

does not depend on

for any

s<t

, or a markov type of assumption,

Yt+1

{Ys+1,Is,As,s<t}|It,At

. Either of

them implies E£˜

²i˜

²jIiIj(Ai−ρi)(Aj−ρj)¤=0, so that Σβand cNhave the same simpliﬁed forms.

Rationale for multiple of F distribution

The distribution of the quadratic form,

(

X−µ

)

0ˆ

Σ−1

(

X−µ

) con-

structed from a random sample of size

of N(

µ,Σ

) random variables in which

is the sample covariance

matrix follows a Hotelling’s

-squared distribution. The Hotelling’s

-squared distribution is a multiple of the F

distribution,

d1(d1+d2−1)

d2Fd1,d2

in which

is the dimension of

, and

is the sample size. Our sample sample

approximation replaces

(the number of parameters in the test statistic) and

n−q−p

(the sample

size minus the number of nuisance parameters minus d1).

Formula for adjusted ˆ

Wand ˆ

Deﬁne a individual-speciﬁc residual vector

as the

T×

1 vector with

entry

et=Yt+1−ItB0

tˆ

α−It

(

At−ρt

)

tˆ

. For each individual deﬁne the

th row of the

T×

(

p+q

) individual-

speciﬁc matrix

by (

ItB0

t,It

(

At−ρt

)

). Then deﬁne

H=X£PNX0X¤−1X0

. The matrix

Q−1

is given by the

lower right

p×p

block in the inverse of

£PNX0X¤

; the matrix

is given by the lower right

p×p

block in

PN£XT(I−H)−1ˆ

eˆ

e0(I−H)−1X¤.

Appendix B Further Simulations and Details

B.1 Simulation Results When Working Assumptions are True

We conduct a variety of simulations in settings in which the working assumptions hold, the scientist provides

the correct pattern for the expected availability,

τt=E

[

] and under the alternate, the standardized proximal

main effect is

(

)

=Z0

. Here we will mainly focus on the setup where the duration of the study is 42 days and

there are 5 decision times within each day, but similar results can be obtained in different setups; see below. The

randomization probability is 0.4, e.g.

ρ=ρt=P

(

At=

4. The sample size formula is given in (8) and (9).

The test statistic is given by (6) in which

and

equal to

¡1,bt−1

5c,bt−1

5c2¢0

. All simulations are based on 1,000

simulated data sets. The signiﬁcance level is 0.05 and the desired power is 80%.

In the ﬁrst simulation, the data for each simulated subject is generated sequentially as follows. For

1,. . . ,T=210, It,Atand Yt+1are generated by

ItBer

∼¡τt¢,AtBer

∼¡ρ¢

Yt+1=α(t)+(At−ρ)d(t)+²t, if It=1

where

(

)

=Z0

and

τt

are same as in the sample size model. The conditional mean,

[

Yt+1|It=

=α

(

) is

given by

(

)

=α1+α2bt−1

5c+ α3bt−1

5c2

, where

α1=

α2=

727,

α3= −

−4

(so that (1/

)

Ptα

(

)

−

(1)

argmaxtα

(

)

). We consider 5 differing distributions for the errors

{²t}T

t=1

: independent normal;

independent (scaled) Student’s

distribution with 3 degrees of freedom; independent (centered) exponential

distribution with

λ=

1; a Gaussian AR(1) process, e.g.

²t=φ²t−1+vt

, where

is white noise with variance

σ2

such that

Var

(

²t

)

1; and lastly a Gaussian AR(5) process, e.g.

²t=φ

5P5

j=1²t−j+vt

, where

is white

noise with variance

σ2

such that

Var

(

²t

)

1. In all cases the errors are scaled to have mean 0 and variance 1

(i.e.

[

²t|It=

Var

[

²t|At,It=

1). Additionally four availability patterns, e.g. time varying values for

τt=E

[

], are considered; see Figure (1). The simulated type 1 error rate and power when the duration of study

is 42 days are reported in Table 2B and 3B. The simulation results in other setups, e.g. the length of the study is 4

week and 8 week, are reported in Table 4B. The associated sample sizes are given in Table 1B.

Since neither the working assumptions nor the inputs to the sample size formula specify the dependence of

the availability indicator,

on past treatment. In the second simulation, we consider the setting in which the

availability decreases as the number of treatments provided in the recent past increase. In particular, the data

are generated as follows,

ItBer

∼¡τt+η

j=1

(At−jIt−j−E[At−jIt−j])¢,AtBer

∼¡ρ¢

Yt+1=α(t)+(At−ρ)d(t)+²t, if It=1

Note that since we center

j=1At−jIt−j

in the generative model of

, the expected availability is

τt

. The

speciﬁcation of

(

) and

²t

are same as in the ﬁrst simulation. The simulated type I error rate and power

are reported Table 5B.

B.2 Further Details When Working Assumptions are False

B.2.1 Working Assumption (a) is Violated.

Here we consider another setting in which the working assumption (a) is violated, e.g. the underlying true

[

Yt+1|It=

1] follows a non-quadratic form (recall that

is given by

¡1,bt−1

5c,bt−1

5c2¢0

). The data is generated

as follows

ItBer

∼¡τt¢,AtBer

∼¡ρ¢

Yt+1=α(t)+(At−ρ)Z0

td+²t, if It=1

where

(

)

[

Yt+1|It=

1] is provided in Figure 4. For each case,

(

) satisﬁes

(1)

5 and (1/

)

t=1−α

(1)

1. The error terms

{²t}N

t=1

are i.i.d N(0

1). The day of maximal proximal effect is assumed to be 29. Additionally,

different values of averaged standardized treatment effect and four patterns of availability in Figure 1 with

average 0.5 are considered. The simulation results are reported in Table 7B.

B.2.2 Additional Simulation Results When Other Working Assumptions are False

The main body of the paper reports part of the results when working assumptions (b), (c) and (d) are violated.

Additional simulation results are provided here. In particular, the simulation result is reported in Table 9B when

(

) follows other non-quadratic forms, e.g. working assumption (b) is false; see Figure 5. The simulated Type 1

error rate and power when working assumption (c) is false are reported in Table 10B. The simulated Type 1 error

rate when working assumption (d) is violated is reported in Table 11B.

B.2.3 Simulation Results when ¯

dand ¯

τare misspeciﬁed.

As discussed in the paper, the ﬁrst scenario considers the setting in which the scientist provides the correct

availability pattern,

[

]

t=1

, the correct time at which the maximal standardized proximal main effect is

achieved (

argmaxtZ0

) and the correct initial standardized proximal main effect (

1d=d1=

0) but provides

too low a value of the averaged across time, standardized proximal main effect

d=1

TPT

t=1Z0

. The simulated

power is provided in Table 12B. In the second scenario, the scientist provides the correct

argmaxtZ0

, correct

1d=d1=

0, correct

d=1

TPT

t=1Z0

and although the scientist’s time-varying pattern of availability is correct,

the magnitude, e.g. the average availability, is underestimated. The simulation result is in Table 13B.

Table 1B: Sample Sizes when the proximal treatment effect satisﬁes

(

)

=Z0

. The signiﬁcance

level is 0.05. The desired power is 0.80.

Duration of Study Availability Pattern Max

τ= 0.5 ¯

τ= 0.7

Average Proximal Effect

0.10 0.08 0.06 0.10 0.08 0.06

4-week

Pattern 1

15 59 89 154 43 65 112

22 60 91 158 44 66 114

29 58 87 152 43 64 110

Pattern 2

15 59 89 154 43 65 112

22 60 92 159 44 67 115

29 58 89 154 43 64 111

Pattern 3

15 59 90 157 44 66 113

22 63 96 167 46 69 119

29 62 94 163 45 67 115

Pattern 4

15 59 89 155 43 65 112

22 57 86 150 43 64 110

29 54 82 142 41 61 105

6-week

Pattern 1

22 41 61 105 31 45 76

29 42 64 109 32 47 79

36 41 62 106 31 45 77

Pattern 2

22 41 61 105 31 45 76

29 43 64 110 32 47 80

36 42 62 107 31 46 77

Pattern 3

22 42 62 106 31 46 77

29 44 66 114 33 48 82

36 43 65 112 32 47 80

Pattern 4

22 41 62 106 31 45 77

29 41 62 106 31 46 78

36 40 59 101 30 44 74

8-week

Pattern 1

29 32 47 80 25 35 58

36 33 49 84 26 37 61

43 33 48 82 25 36 60

Pattern 2

29 32 47 80 25 35 58

36 34 49 84 26 37 61

43 33 49 82 25 36 60

Pattern 3

29 33 48 82 25 36 59

36 35 51 87 26 38 63

43 34 50 86 26 37 62

Pattern 4

29 33 48 81 25 36 59

36 33 49 83 25 36 61

43 32 47 80 25 35 59

“Max”is the day in which the maximal proximal effect is attained.

τ=

(1/

)

t=1E

[

] is the

average availability.

Table 2B: Simulated Type I error rate (%) when working assumptions are true. Duration of the

study is 6-week. The associated sample size is given in Table 1B.

Error Term Availability Pattern Max

τ= 0.5 ¯

τ= 0.7

Average Proximal Effect

0.10 0.08 0.06 0.10 0.08 0.06

i.i.d. Normal

Pattern 1

22 3.8 4.5 4.9 4.6 5.3 4.8

29 4.7 6.0 4.6 4.0 3.2 5.0

36 5.0 5.4 4.9 4.3 4.8 4.6

Pattern 2

22 4.8 4.1 4.8 4.4 3.5 4.1

29 4.3 6.2 3.2 4.6 4.2 4.2

36 4.5 4.8 5.2 4.5 3.5 5.4

Pattern 3

22 4.7 4.5 6.3 4.4 4.9 4.9

29 4.1 5.1 4.6 4.3 6.0 5.6

36 4.7 4.4 4.6 4.1 5.1 4.4

Pattern 4

22 5.4 3.5 4.5 4.8 4.7 5.0

29 5.2 4.5 4.5 5.0 5.0 5.1

36 3.8 4.1 5.4 4.7 5.0 5.9

i.i.d. t dist. Pattern 1

22 4.3 4.4 3.2 4.1 4.1 5.2

29 5.0 3.8 3.2 3.7 4.2 6.3

36 4.3 4.5 4.0 5.0 5.7 5.4

i.i.d. Exp. Pattern 1

22 4.5 4.6 4.4 3.7 7.1 3.1

29 4.5 4.6 4.2 4.5 4.5 4.7

36 2.7 4.8 4.8 3.9 3.7 3.4

AR(1), φ=−0.6 Pattern 1

22 4.3 5.3 4.6 3.8 4.2 4.0

29 4.6 5.4 5.1 4.0 4.4 4.3

36 4.7 4.0 4.0 4.1 4.2 3.9

AR(1), φ=−0.3 Pattern 1

22 5.8 3.4 4.4 3.3 4.0 5.4

29 4.9 4.7 4.6 5.5 5.5 4.5

36 4.0 4.7 4.4 4.9 5.0 4.7

AR(1), φ=0.3 Pattern 1

22 4.6 4.6 4.9 4.3 5.4 4.1

29 4.8 5.3 4.1 4.3 4.2 5.2

36 3.6 3.9 4.9 4.8 4.9 4.9

AR(1), φ=0.6 Pattern 1

22 4.4 5.1 4.9 3.6 5.2 3.7

29 3.7 4.9 4.6 4.5 4.3 5.8

36 4.4 6.7 5.2 5.6 3.6 5.1

AR(5), φ=−0.6 Pattern 1

22 4.4 4.7 5.1 4.2 4.5 5.5

29 4.3 5.1 4.3 3.2 3.5 4.2

36 5.3 4.5 6.1 4.2 4.6 5.4

AR(5), φ=−0.3 Pattern 1

22 3.7 4.4 6.0 5.0 4.5 3.5

29 4.4 4.7 5.2 5.3 4.5 5.0

36 4.5 5.0 5.1 4.1 5.3 4.8

AR(5), φ=0.3 Pattern 1

22 5.3 4.3 5.7 4.8 4.1 4.3

29 3.9 4.8 4.1 4.0 4.3 4.9

36 4.2 5.5 5.1 3.6 4.5 3.6

AR(5), φ=0.6 Pattern 1

22 5.1 4.5 4.0 4.5 3.8 5.2

29 5.2 4.8 4.5 2.9 5.3 4.4

36 4.1 3.6 4.6 3.9 4.4 4.9

“Max”is the day in which the maximal proximal effect is attained.

τ=

(1/

)

t=1E

[

] is the aver-

age availability.

is the parameter for AR(1) and AR(5) process. Bold numbers are signiﬁcantly(at

.05 level) greater than .05.

Table 3B: Simulated Power(%) when working assumptions are true. Duration of the study is 6-week.

The associated sample size is given in Table 1B

Error Term Availability Pattern Max

τ= 0.5 ¯

τ= 0.7

Average Proximal Effect

0.10 0.08 0.06 0.10 0.08 0.06

i.i.d. Normal

Pattern 1

22 80.9 80.0 81.0 78.7 77.5 80.7

29 78.4 80.6 77.8 80.6 78.7 79.0

36 80.2 80.0 79.6 79.4 80.2 77.0

Pattern 2

22 80.3 78.1 78.8 80.6 79.6 79.8

29 80.3 79.1 80.2 77.4 79.9 79.9

36 76.8 79.3 80.2 78.5 78.4 80.0

Pattern 3

22 83.5 81.5 77.7 78.5 81.3 78.7

29 77.9 79.1 78.5 77.8 78.8 79.0

36 77.3 78.1 79.8 79.8 79.9 79.1

Pattern 4

22 77.2 79.7 81.8 80.2 79.0 78.8

29 80.1 78.8 80.3 79.4 80.6 80.1

36 80.5 79.4 80.0 78.9 79.9 78.1

i.i.d. t dist. Pattern 1

22 80.4 81.9 81.0 79.7 79.4 80.7

29 81.7 82.2 82.2 79.1 82.3 77.3

36 80.8 78.8 79.5 81.8 81.6 79.9

i.i.d. Exp. Pattern 1

22 81.0 81.6 79.7 77.2 80.1 80.2

29 80.6 82.4 80.3 79.0 79.8 80.3

36 82.1 79.8 80.8 79.8 79.5 80.3

AR(1), φ=−0.6 Pattern 1

22 78.5 80.3 78.5 82.3 79.8 80.3

29 78.7 80.8 80.0 77.1 79.5 77.9

36 77.7 80.3 80.2 78.2 77.4 83.6

AR(1), φ=−0.3 Pattern 1

22 77.9 79.0 79.6 80.0 77.8 80.4

29 77.9 79.1 80.0 79.0 78.0 78.4

36 78.1 81.2 80.2 80.7 80.9 78.4

AR(1), φ=0.3 Pattern 1

22 80.2 78.5 80.8 80.5 79.6 82.6

29 78.0 80.0 80.0 78.0 79.4 80.1

36 77.6 82.5 80.6 77.0 78.9 82.0

AR(1), φ=0.6 Pattern 1

22 80.4 79.8 79.5 80.7 79.5 82.0

29 78.9 81.5 79.3 79.5 81.3 79.5

36 79.5 78.4 78.8 80.1 77.9 77.8

AR(5), φ=−0.6 Pattern 1

22 79.9 79.4 80.0 78.7 79.2 79.4

29 80.0 78.3 79.1 76.8 79.6 79.3

36 80.5 80.0 79.2 80.1 78.0 80.4

AR(5), φ=−0.3 Pattern 1

22 79.2 80.4 81.9 81.3 77.7 79.1

29 80.0 82.3 80.5 80.5 82.2 79.2

36 75.9 78.7 79.3 79.0 79.4 79.9

AR(5), φ=0.3 Pattern 1

22 79.4 80.8 79.8 79.5 77.3 81.2

29 78.0 79.2 79.2 79.2 80.5 78.4

36 78.3 79.1 78.1 80.7 80.5 79.5

AR(5), φ=0.6 Pattern 1

22 80.2 77.9 80.3 78.6 78.4 80.3

29 76.9 79.3 80.2 79.1 80.6 80.5

36 78.7 84.0 80.1 78.8 79.3 78.8

“Max”is the day in which the maximal proximal effect is attained.

τ=

(1/

)

t=1E

[

] is the aver-

age availability.

is the parameter for AR(1) and AR(5) process. Bold numbers are signiﬁcantly(at

.05 level) less than .80.

Table 4B: Simulated type 1 error rate(%) and power(%) when the duration of study is 4-week and

8-week. Error terms follow i.i.d. N(0,1). The associated sample size is given in Table 1B.

Duration of Study Availability Pattern Max

τ= 0.5 ¯

τ= 0.7

Average Proximal Effect

0.10 0.08 0.06 0.10 0.08 0.06

4-week

Pattern 1

15 4.1 4.7 6.3 5.3 5.5 5.6

22 5.2 4.4 4.7 3.1 4.7 4.4

29 5.7 5.5 5.6 4.3 4.2 4.2

Pattern 2

15 4.8 4.8 5.0 5.0 5.2 5.3

22 5.1 5.2 4.7 3.7 4.2 3.7

29 5.6 5.1 4.2 4.2 4.9 4.4

Pattern 3

15 4.7 5.0 4.6 6.1 5.3 5.1

22 4.9 4.0 6.6 4.2 3.8 4.1

29 4.7 4.3 5.1 4.6 5.8 3.5

Pattern 4

15 4.9 4.6 4.8 3.0 5.9 3.8

22 3.5 5.1 4.5 5.2 3.8 6.0

29 4.4 6.4 4.7 4.4 4.3 4.7

8-week

Pattern 1

29 4.1 4.6 4.0 5.3 5.0 5.9

36 3.3 4.7 6.5 4.6 5.4 4.3

43 3.2 5.1 5.2 5.0 3.4 5.0

Pattern 2

29 3.9 5.0 4.5 4.2 3.7 4.1

36 3.8 4.6 4.9 4.5 3.4 5.2

43 3.9 5.4 5.0 3.4 3.8 5.0

Pattern 3

29 4.6 4.2 3.7 5.2 4.1 4.0

36 4.3 5.1 6.1 4.6 5.0 4.6

43 4.6 6.0 4.1 5.0 4.9 4.0

Pattern 4

29 4.5 5.2 2.9 3.6 5.3 4.4

36 4.5 5.2 3.7 2.7 3.7 4.7

43 4.2 7.1 4.9 4.4 4.5 4.8

4 week

Pattern 1

15 80.4 79.0 78.5 79.6 82.8 80.3

22 78.8 78.7 80.7 78.7 79.2 80.0

29 76.2 80.6 80.1 81.3 80.1 79.1

Pattern 2

15 82.4 77.8 77.2 75.9 80.0 78.9

22 77.2 80.3 81.5 75.8 80.7 82.0

29 80.1 79.3 80.1 78.0 77.7 76.9

Pattern 3

15 79.3 79.8 79.2 79.1 76.5 80.8

22 80.0 80.0 79.0 79.0 80.2 81.8

29 79.4 80.7 79.3 80.4 79.6 79.2

Pattern 4

15 82.6 78.3 79.2 80.5 80.0 79.5

22 80.4 80.7 79.3 79.1 78.5 79.2

29 78.4 79.2 78.5 79.6 79.2 80.5

8 week

Pattern 1

29 79.7 77.3 76.4 79.1 82.2 79.6

36 78.8 78.6 81.5 80.3 78.2 79.6

43 80.4 77.8 78.7 79.1 80.3 80.1

Pattern 2

29 79.3 81.1 79.8 78.7 79.7 80.2

36 81.2 78.5 79.0 81.3 80.8 78.2

43 80.3 81.5 77.5 75.1 78.8 78.1

Pattern 3

29 80.1 79.0 77.1 78.2 80.4 78.8

36 79.5 79.9 79.6 80.0 80.8 79.6

43 80.5 79.5 79.6 79.4 79.4 80.2

Pattern 4

29 82.1 79.7 80.7 79.7 79.0 78.4

36 77.8 78.2 80.1 77.9 76.9 79.5

43 79.6 78.5 78.1 79.4 80.6 79.5

“Max”is the day in which the maximal proximal effect is attained.

τ=

(1/

)

t=1E

[

] is the average

availability. Bold numbers are signiﬁcantly(at .05 level) greater than .05 and less than .80.

Table 5B: Simulated Type 1 error rate(%) and power(%) when the availability indicator,

depends on the recent past

treatments with

η=−

2. The expected availability is constant in

and equal to 0

5. Duration of study is 42 days. The

associated sample size is given in Table 1B.

Error

Term

φMax

τ= 0.5 ¯

τ= 0.7 ¯

τ= 0.5 ¯

τ= 0.7

Average Proximal Effect

0.10 0.08 0.06 0.10 0.08 0.06 0.10 0.08 0.06 0.10 0.08 0.06

AR(1)

-0.6

22 4.8 5.4 4.5 3.4 5.8 3.7 81.5 78.0 79.4 81.7 77.9 80.7

29 4.7 4.4 4.2 4.0 4.9 4.6 79.4 80.9 80.7 78.2 79.2 79.7

36 4.3 5.3 4.4 4.2 3.9 5.5 79.5 81.5 79.8 80.2 79.2 80.7

-0.3

22 4.7 3.8 4.4 3.5 4.4 4.6 78.7 81.2 80.3 80.9 77.9 78.5

29 3.8 4.0 4.9 3.5 5.0 4.4 80.1 79.5 81.2 77.3 79.5 77.1

36 2.7 5.7 4.0 3.3 4.7 5.2 76.8 80.4 79.9 78.8 79.5 79.4

0.3

22 4.8 4.1 4.4 5.0 5.4 3.6 83.0 79.8 79.4 81.3 78.9 79.2

29 4.9 4.6 5.0 4.4 5.5 5.6 79.5 80.3 82.2 78.5 80.7 77.6

36 4.9 4.9 4.2 3.3 4.5 4.8 80.0 78.9 79.5 81.7 79.4 79.6

0.6

22 4.5 5.1 4.7 4.3 4.6 4.0 80.3 78.9 81.1 81.2 81.5 77.9

29 3.4 4.5 5.1 4.4 4.3 4.6 79.3 76.2 79.4 81.3 80.6 79.4

36 4.8 4.3 4.2 4.1 4.5 4.5 77.5 80.5 80.9 76.7 80.0 79.7

AR(5)

-0.6

22 4.8 4.6 4.3 3.7 4.7 3.5 81.9 81.4 81.6 79.8 78.3 78.9

29 6.5 4.1 4.5 3.3 4.5 4.8 77.5 79.9 79.8 79.9 79.3 79.3

36 3.5 5.7 4.4 4.6 4.7 5.7 77.8 80.8 78.6 77.9 79.2 81.7

-0.3

22 4.3 4.9 4.0 4.3 5.6 5.0 77.7 81.8 80.0 80.1 80.3 81.1

29 3.9 4.0 5.0 3.2 5.7 5.1 80.0 80.9 80.3 80.6 80.3 77.8

36 4.0 3.6 4.7 4.8 4.8 3.2 79.0 80.4 80.8 80.1 79.0 76.5

0.3

22 3.5 4.9 5.0 4.1 3.8 4.1 77.4 82.9 78.5 80.6 81.4 80.2

29 4.6 6.1 4.7 4.7 4.1 4.1 78.7 82.0 78.0 81.4 76.5 81.3

36 5.1 4.4 4.0 3.2 3.9 4.7 79.7 81.8 78.6 79.1 77.4 79.0

0.6

22 5.0 4.6 4.3 4.0 4.0 5.5 80.5 79.4 82.5 79.2 81.1 81.0

29 5.6 4.3 6.9 5.6 3.4 3.1 78.3 80.0 80.5 80.8 80.4 78.4

36 4.8 4.8 4.8 3.5 3.7 5.5 78.2 80.5 80.3 77.6 80.5 79.1

“Max”is the day in which the maximal proximal effect is attained.

τ=

(1/

)

t=1E

[

] is the average availability.

the parameter for AR(1) and AR(5) process. Bold numbers are signiﬁcantly(at .05 level) greater than .05 and less than

.80.

Table 6B: Simulated type I error rate(%) and power(%) when working assumption (a) is violated. Scenario 1. The

average availability is 0.5. The day of maximal proximal effect is 29.

θ¯

dAvailability Pattern

Pattern 1 Pattern 2 Pattern 3 Pattern 4 Pattern 1 Pattern 2 Pattern 3 Pattern 4

0.5 ¯

0.10 5.5 4.6 4.2 5.1 79.7 79.4 80.5 80.1

0.08 5.1 4.4 5.4 4.6 80.4 78.9 80.4 78.7

0.06 4.1 5.5 4.6 4.3 77.5 82.7 81.0 81.0

0.10 4.8 4.3 3.7 4.1 79.3 78.3 77.8 79.4

0.08 5.4 4.9 4.6 5.5 78.8 79.3 78.0 80.6

0.06 4.4 3.5 5.1 4.6 78.4 79.3 79.0 80.4

1.5 ¯

0.10 4.4 4.1 4.4 4.8 78.3 80.5 78.4 79.9

0.08 5.0 4.3 4.3 3.9 80.5 79.7 78.7 81.9

0.06 4.0 5.1 5.5 5.6 77.2 80.8 81.6 80.3

2¯

0.10 4.1 3.8 5.0 5.5 77.7 78.8 79.0 78.4

0.08 4.0 5.0 3.7 5.7 79.3 81.5 79.1 79.4

0.06 4.9 4.3 5.2 5.3 80.8 79.0 77.5 80.9

(1/

)

t=1Z0

is the average proximal effect.

is the coefﬁcient of

[

Yt+1|It=

1]. Bold Numbers are

signiﬁcantly (at .05 level) greater than .05 (for type I error rate) and lower than 0.80(for power).

Shape 1 Shape 2 Shape 3

2.5

3.0

3.5

4.0

4.5

0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Time

Response

Figure 4: Conditional expectation of proximal response, E[Yt+1|It=1]. The horizontal axis is the decision time

point. The vertical axis is E[Yt+1|It=1].

Table 7B: Simulated Type 1 error rate(%) and power (%) when working assumption (a) is violated. Scenario 2.

The shapes of

(

)

[

Yt+1|It=

1] and patterns of availability are provided in Figure 4 and Figure 1. The average

availability is 0.5. The day of maximal proximal effect is 29. The associated sample size is given in Table 1B.

Availability Pattern

α(t)¯

dPattern 1 Pattern 2 Pattern 3 Pattern 4 Pattern 1 Pattern 2 Pattern 3 Pattern 4

Shape 1

0.10 3.6 4.3 4.7 4.5 77.4 80.2 76.2 75.9

0.08 5.9 3.8 4.1 3.4 79.7 80.1 78.9 80.6

0.06 4.6 5.7 4.2 6.5 78.7 76.3 78.3 79.9

Shape 2

0.10 4.8 4.8 4.4 4.1 79.2 79.1 78.5 79.7

0.08 3.9 5.4 4.8 4.3 77.7 80.4 76.8 80.9

0.06 5.1 5.5 3.4 4.9 78.3 79.4 79.8 80.2

Shape 3

0.10 5.1 3.5 4.3 4.4 79.1 79.4 75.6 78.0

0.08 4.6 5.0 6.2 3.8 78.3 78.1 79.1 78.1

0.06 4.8 4.4 5.4 4.2 78.0 78.3 79.8 77.7

(1/

)

t=1Z0

is the average standardized treatment effect. Bold Numbers are signiﬁcantly (at .05 level) greater

than .05 (for type I error rate) and lower than 0.80(for power).

Maintained

Severely Degraded

Slightly Degraded

0.00

0.05

0.10

0.15

0.20

0.00

0.05

0.10

0.15

0.20

0.00

0.05

0.10

0.15

0.20

Max = 15

Max = 22

Max = 29

0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Time

Proximal Effect

Figure 5: Proximal Main Effects of Treatment,

(

)

t=1

: representing maintained, slightly degraded and severely

degraded time-varying treatment effects. The horizontal axis is the decision time point. The vertical axis is

the standardized treatment effect. The "Max" in the title refers to the day of maximal effect. The average

standardized proximal effect is 0.1 in all plots.

Table 8B: Sample Sizes when working assumption (b) is violated. The shape of the standardized proximal effect,

d(t)=β(t)/ ¯

σand pattern for availability, E[It] are provided in Figure 5 and in Figure (1).

τ= 0.5 ¯

τ= 0.7

Availability Shape of d(t)

dPattern Max Maintained Slightly

Degraded

Severely

Degraded

Maintained Slightly

Degraded

Severely

Degraded

0.10

15 43 41 39 32 31 29

Pattern 1 22 43 41 40 33 31 30

29 38 37 38 29 28 29

15 43 41 39 33 31 30

Pattern 2 22 43 42 40 33 31 30

29 38 37 38 29 28 29

15 45 43 41 33 32 31

Pattern 3 22 44 43 42 33 32 31

29 37 38 39 28 28 29

15 42 39 37 32 30 28

Pattern 4 22 44 41 39 33 31 30

29 39 38 38 29 28 28

0.08

15 65 61 58 48 45 43

Pattern 1 22 65 62 60 48 46 44

29 56 55 56 42 41 42

15 65 61 59 48 45 43

Pattern 2 22 65 62 60 48 46 44

29 56 55 56 42 41 42

15 67 64 62 49 47 45

Pattern 3 22 66 64 63 48 47 46

29 56 56 59 41 41 43

15 63 59 55 47 44 41

Pattern 4 22 65 61 58 48 45 43

29 58 56 56 43 41 41

0.06

15 111 105 100 81 76 73

Pattern 1 22 112 106 103 81 77 75

29 96 94 96 70 69 70

15 112 105 100 81 77 73

Pattern 2 22 112 106 103 81 77 75

29 96 94 96 70 68 70

15 116 111 106 83 79 76

Pattern 3 22 114 110 108 82 79 78

29 95 96 101 69 69 72

15 108 100 94 79 74 70

Pattern 4 22 112 105 99 81 76 73

29 100 95 95 72 69 70

“Max”is the day in which the maximal proximal effect is attained.

(1/

)

t=1Z0

is the average standard-

ized treatment effect.

Table 9B: Simulated Power(%) when working assumption (b) is violated. The shape of the standardized

proximal effect,

(

)

=β

(

and pattern for availability,

[

] are provided in Figure 5 and in Figure (1). The

corresponding sample sizes are given in Table 8B.

τ= 0.5 ¯

τ= 0.7

Availability Shape of d(t)

dPattern Max Maintained Slightly

Degraded

Severely

Degraded

Maintained Slightly

Degraded

Severely

Degraded

0.10

15 78.4 78.8 78.6 79.1 80.1 77.6

Pattern 1 22 80.4 79.5 81.2 80.0 76.9 77.9

29 80.4 79.2 78.9 77.3 76.8 81.1

15 78.6 79.9 79.9 80.1 80.4 81.3

Pattern 2 22 78.3 81.2 78.8 79.2 80.8 80.5

29 77.9 80.8 79.3 78.1 77.7 82.2

15 81.0 79.7 77.4 77.9 80.9 77.6

Pattern 3 22 78.9 79.1 80.0 79.7 79.4 75.9

29 80.9 77.5 77.7 80.6 79.2 78.5

15 79.7 79.5 77.9 79.5 81.7 78.0

Pattern 4 22 78.9 77.9 80.4 82.2 78.9 78.8

29 77.9 79.7 79.0 78.0 80.2 80.8

0.08

15 80.5 79.5 78.6 80.6 79.2 78.7

Pattern 1 22 78.9 78.7 78.8 78.9 80.7 80.3

29 76.6 78.0 78.3 80.9 78.6 80.4

15 81.0 79.3 78.7 82.0 80.5 80.1

Pattern 2 22 82.4 80.6 80.0 78.0 79.6 79.4

29 79.2 76.9 81.9 78.3 78.8 79.7

15 78.2 81.6 80.9 79.1 79.2 77.5

Pattern 3 22 80.9 79.5 78.6 79.2 78.3 81.4

29 80.4 79.3 77.5 77.9 80.2 82.3

15 79.4 79.4 78.1 78.6 77.4 78.8

Pattern 4 22 81.3 78.4 78.4 80.6 79.4 80.4

29 79.9 79.3 79.8 79.5 79.7 81.2

0.06

15 81.2 80.5 79.0 77.8 78.7 79.6

Pattern 1 22 80.0 81.7 79.8 80.7 80.5 80.2

29 81.2 78.7 79.2 81.2 79.7 80.1

15 78.7 77.5 81.4 80.7 81.0 80.7

Pattern 2 22 80.6 81.8 79.2 80.3 81.6 80.2

29 78.5 80.2 80.0 77.7 78.1 78.0

15 78.1 80.0 80.9 79.7 79.3 78.8

Pattern 3 22 81.2 80.2 80.0 78.3 82.2 81.1

29 79.6 81.6 79.8 80.2 81.6 76.9

15 78.2 79.8 78.9 79.5 77.3 79.2

Pattern 4 22 79.2 81.1 79.4 76.8 79.2 80.4

29 79.9 78.5 79.8 80.1 78.9 81.8

“Max”is the day in which the maximal proximal effect is attained.

(1/

)

t=1Z0

is the average standard-

ized treatment effect. Bold numbers are signiﬁcantly (at .05 level) lower than .80.

Table 10B: Simulated Type I error rate(%) and power(%) when working assumption (c) is violated.

The trends of

σt

are provided in Figure 3. The standardized average effect is 0.1.

[

]

5. The

associated sample sizes are 41 and 42 when the day of maximal effect is 22 and 29.

Max = 22 Max = 29

φin AR(1) σ1t

σ0tconst. trend 1 trend 2 trend 3 const. trend 1 trend 2 trend 3

0.8 4.1 4.3 3.3 5.4 4.7 4.9 2.8 4.1

-0.6 1.0 4.6 5.0 4.0 4.4 4.4 4.8 4.2 4.3

1.2 3.8 4.5 5.2 5.5 4.3 4.1 4.5 3.8

0.8 5.2 4.7 4.0 3.4 5.4 4.9 6.2 4.5

-0.3 1.0 4.9 4.5 4.5 4.3 5.2 5.1 4.0 3.7

1.2 5.4 4.6 4.1 3.8 3.7 5.2 4.3 5.0

0.8 4.8 4.0 4.1 3.9 4.7 5.2 3.7 4.2

0 1.0 5.4 4.0 5.8 3.9 4.1 4.0 5.9 5.7

1.2 4.4 4.9 5.0 4.6 3.7 4.8 4.4 4.9

0.8 5.3 4.4 4.7 3.2 4.6 5.4 5.6 4.1

0.3 1.0 5.5 4.0 3.4 3.7 5.0 4.6 4.0 3.6

1.2 3.8 4.5 4.5 4.8 4.5 5.0 6.2 4.3

0.8 5.5 3.9 5.3 3.8 3.3 3.5 5.1 4.2

0.6 1.0 4.0 3.7 5.2 5.1 4.8 5.1 5.0 4.7

1.2 4.5 5.1 4.6 4.9 4.5 4.4 4.7 4.8

0.8 82.8 82.7 83.7 79.9 83.6 80.6 88.7 79.2

-0.6 1.0 81.1 79.1 79.9 74.8 77.7 74.3 84.8 70.4

1.2 76.6 76.3 76.3 70.6 77.6 72.0 80.7 70.4

0.8 83.0 83.0 86.0 80.3 82.7 79.2 87.9 78.0

-0.3 1.0 77.6 81.4 80.7 74.9 79.1 74.5 86.0 73.7

1.2 78.2 76.9 77.3 73.4 74.4 71.2 81.0 70.7

0.8 84.6 84.6 82.1 79.0 81.8 81.5 88.0 78.0

0 1.0 80.1 78.6 80.9 73.6 77.7 76.5 86.1 71.8

1.2 76.0 76.7 77.4 70.6 74.5 69.9 83.4 69.6

0.8 83.6 79.7 84.6 79.7 82.1 81.7 88.2 75.7

0.3 1.0 81.5 82.4 82.3 73.9 79.5 74.6 85.1 71.5

1.2 74.8 76.6 78.2 71.1 75.5 71.1 82.5 70.1

0.8 81.4 83.1 83.5 80.5 83.1 77.1 86.6 76.9

0.6 1.0 80.7 76.4 79.0 74.8 80.4 73.4 84.7 76.8

1.2 77.0 77.5 77.0 73.5 74.4 72.5 81.6 69.4

is the parameter in AR(1) process for

{²t}T

t=1

. Bold numbers are signiﬁcantly(at .05 level) greater

than .05 and lower than .80.

Table 11B: Simulated Type I error rate(%) when work-

ing assumption (d) is violated.

[

]

5. The average

effect is 0.1 and day of maximal effect is 29. N = 42.

Parameters in Itγ1

γ2-0.1 -0.2 -0.3

-0.2 5.7 3.2 3.9

η1=−0.1,η2= −0.1 -0.5 3.2 4.2 4.9

-0.8 4.2 5.1 5.5

-0.2 5.4 3.8 3.9

η1=−0.2,η2= −0.1 -0.5 4.4 4.4 4.8

-0.8 4.7 4.3 4.6

-0.2 4.5 5.0 5.0

η1=−0.1,η2= −0.2 -0.5 4.9 3.8 6.0

-0.8 4.7 4.8 4.8

η1,η2

are parameters in generating

γ1

γ2

are coef-

ﬁcients in the model of

Yt+1

. Bold Numbers are signiﬁ-

cantly (at .05 level) greater than .05.

Table 12B: Degradation in power when average proximal effect is underesti-

mated. Day of maximal effect is 29 and the average availability is 0.5.

din Sample

Size Formula True ¯

dAvailability Pattern

Pattern 1 Pattern 2 Pattern 3 Pattern 4

0.10 (N = 42)

0.098 76.2 78.9 77.6 78.6

0.096 75.1 74.6 78.8 74.0

0.094 73.7 70.7 75.4 73.4

0.092 71.5 71.6 73.2 71.6

0.090 68.9 68.4 69.6 67.3

0.088 65.4 65.6 66.1 65.7

0.086 66.4 67.9 65.2 66.7

0.084 62.3 63.4 63.0 59.6

0.082 60.0 60.2 60.5 58.2

0.080 58.9 59.8 57.8 61.4

0.08(N = 64)

0.078 78.2 80.2 76.8 75.8

0.076 77.3 76.7 76.2 75.4

0.074 73.1 72.2 71.2 71.4

0.072 70.7 71.0 69.4 68.2

0.070 68.2 66.0 65.2 66.1

0.068 65.5 64.3 64.6 65.7

0.066 62.8 62.3 61.8 59.4

0.064 61.9 58.5 59.5 62.1

0.062 53.9 52.6 57.0 56.9

0.060 54.6 51.1 54.8 53.4

0.06(N = 109)

0.058 75.6 76.9 74.0 78.1

0.056 73.9 73.1 73.1 72.7

0.054 68.6 71.1 69.3 68.5

0.052 65.4 69.4 63.6 66.8

0.050 61.0 62.8 64.1 63.2

0.048 57.4 58.6 56.4 56.1

0.046 53.6 53.4 52.9 54.8

0.044 52.0 48.9 50.1 53.0

0.042 45.7 43.9 44.9 46.4

0.040 40.4 42.2 42.3 42.7

Table 13B: Degradation in Power when average availability is underestimated. The day of

maximal treatment effect is attained at day 29 and the average proximal main effect is 0.1.

(1/T)PT

t=1τtin True Availability Pattern

Sample Size Formula (1/T)PT

t=1τtPattern 1 Pattern 2 Pattern 3 Pattern 4

0.5 (N = 42)

0.048 76.4 81.7 76.0 78.2

0.046 73.9 75.5 73.6 75.8

0.044 70.6 72.1 71.0 71.7

0.042 70.8 70.6 74.2 70.3

0.040 70.3 69.2 65.7 68.6

0.038 66.0 66.8 67.8 67.0

0.036 64.0 62.5 62.4 62.9

0.034 60.8 61.3 59.4 63.9

0.032 56.4 59.2 54.7 59.8

0.030 51.4 53.1 51.9 54.5

0.7 (N = 32)

0.068 79.5 76.1 79.1 75.0

0.066 77.3 75.7 74.0 76.4

0.064 74.5 74.7 73.5 77.1

0.062 73.2 73.0 75.1 72.5

0.060 69.8 70.5 73.5 72.5

0.058 71.0 69.6 71.3 67.3

0.056 68.8 70.3 66.6 64.0

0.054 68.1 65.8 65.3 68.6

0.052 62.4 64.9 65.6 62.9

0.050 60.6 63.3 62.8 61.4

Acknowledgment

This research was supported by NIH grants P50DA010075, R01HL12544001 and grant U54EB020404 awarded

by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) through funds provided by the

trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov).

References

A. CUC C IA R E, M ., R. WEINGARDT, K., J. GREENE, C ., AN D HOFFM AN, J. Current trends in using internet and

mobile technology to support the treatment of substance use disorders. Current Drug Abuse Reviews 5, 3

(2012), 172–177.

ALESSI, S. M., AN D PETRY, N . M. A randomized study of cellphone technology to reinforce alcohol abstinence

in the natural environment. Addiction 108, 5 (2013), 900–909.

BOX, G. E., P.HUNTER, J. S., AN D HUNTER, W. G. Statistics for experimenters : an introduction to design,

data analysis, and model building. Wiley series in probability and mathematical statistics, 1978.

BOYE R, E., FLETCHER, R., FAY, R., SM ELS ON, D., ZIEDONIS, D., A ND PICARD, R. Preliminary efforts directed

toward the detection of craving of illicit substances: The iheal project. Journal of Medical Toxicology 8, 1

(2012), 5–9.

BUM AN , M., H EKLER, E., F LOEGEL, T., FLOREZ PRE GON ERO, A., G., M., A ND RI L EY, K. Step validation of the

jawbone up band in normal, overweight, and obese adults. In Proceedings of the American Medical Society

for Sports Medicine. (2014).

CHA KR A BO RT Y, B., COLLINS, L. M., STRECHER, V. J., AN D MURPHY, S. A. Developing multicomponent

interventions using fractional factorial designs. Statistics in Medicine 28, 21 (2009), 2687–2708.

7. COHEN, J. Statistical Power Analysis for the Behavioral Sciences(2nd), 2nd ed. Routledge, July 1 1988.

FREE, C., PHILLIPS, G., GA LLI , L., WAT SON , L., FELIX, L., EDWA R DS , P., PATEL , V., AN D HAI N ES , A. The effec-

tiveness of mobile-health technology-based health behaviour change or disease management interventions

for health care consumers: A systematic review. PLoS Med 10, 1 (01 2013), e1001362.

GUSTAFSON, D., FM, M. , M, C. , A ND E T A L. A smartphone application to support recovery from alcoholism:

A randomized clinical trial. JAMA Psychiatry 71, 5 (2014), 566–572.

10. HOTE LLI NG, H. The generalization of student’s ratio. Ann. Math. Statist. 2, 3 (08 1931), 360–378.

11.

KAP LAN , R. M., AND ST ON E , A . A. Bringing the laboratory and clinic to the community: Mobile technologies

for health promotion and disease prevention. Annual Review of Psychology 64, 1 (2013), 471–498. PMID:

22994919.

12.

KING, A. C., C A ST RO, C. M., BUMA N, M. P., HEKLER, E. B., URIZ AR, G UID O G., J ., AN D AHN, D. K. Behavioral

impacts of sequentially versus simultaneously delivered dietary plus physical activity interventions: the

calm trial. Annals of Behavioral Medicine 46, 2 (2013), 157–168.

13.

KUM AR , S., N ILSEN, W., PAV EL , M., A N D SRI VA STAVA , M. Mobile health: Revolutionizing healthcare through

transdisciplinary research. Computer 46, 1 (2013), 28–35.

14.

LEW IS , M. A. , UHR I G, J . D., BA NN, C. M., HARRIS, J. L., FUR BER G, R . D., COOMES, C., AN D KUHN S, L. M.

Tailored text messaging intervention for hiv adherence: a proof-of-concept study. Health psychology :

ofﬁcial journal of the Division of Health Psychology, American Psychological Association 32, 3 (March 2013),

248—253.

15.

LI, P., AND RE DDE N, D. T. Small sample performance of bias-corrected sandwich estimators for cluster-

randomized trials with binary outcomes. Statistics in Medicine 34, 2 (2015), 281–296.

16.

LIA NG , K.- Y., A ND ZEG ER, S. L. Longitudinal data analysis using generalized linear models. Biometrika 73, 1

(1986), 13–22.

17.

MAN CL , L. A. , A ND DEROUEN, T. A. A covariance estimator for gee with improved small-sample properties.

Biometrics 57, 1 (2001), 126–134.

18.

MAR SC H , L . A. Leveraging technology to enhance addiction treatment and recovery. Journal of Addictive

Diseases 31, 3 (2012), 313–318. PMID: 22873192.

19.

MUESSIG, E. K., P IKE, C. E. , LEGRA ND, S. , A ND HI G HT OW-WEID MAN , B. L. Mobile phone applications for

the care and prevention of hiv and other sexually transmitted diseases: A review. J Med Internet Res 15, 1

(Jan 2013), e1.

20.

NILSEN, W., KU MA R , S. , SHA R , A. , VA RO QU IER S, C., WI L EY, T., RIL EY, W. T. , PAV EL , M., A N D ATIENZA, A. A .

Advancing the science of mhealth. Journal of Health Communication 17, sup1 (2012), 5–10.

21.

QUAN B EC K , A. , GUS TA FS O N, D., MAR SCH , L., MCTAVIS H, F., BR OW N, R., MAR ES, M.-L., JOHNSON, R.,

GLA SS, J., ATW OO D, A., AN D MCDOWELL, H. Integrating addiction treatment into primary care using mobile

health technology: protocol for an implementation research study. Implementation Science 9, 1 (2014), 65.

22.

ROBINS, J. A new approach to causal inference in mortality studies with a sustained exposure period—

application to control of the healthy worker survivor effect. Mathematical Modelling 7, 9–12 (1986), 1393 –

1512.

23.

ROB IN S , J . Addendum to “a new approach to causal inference in mortality studies with a sustained exposure

period—application to control of the healthy worker survivor effect”. Computers and Mathematics with

Applications 14, 9–12 (1987), 923 – 945.

24.

ROBINS, J. M. Optimal structural nested models for optimal sequential decisions. In Proceedings of the

Second Seattle Symposium on Biostatistics (New York, 2004), D. Y. Lin and P. Heagerty, Eds., Springer, pp. 189–

326.

25.

RUB IN , D. B. Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6, 1 (01 1978),

34–58.

26.

SPRUI JT-METZ, D. , AND NILSEN, W. Dynamic models of behavior for just-in-time adaptive interventions.

Pervasive Computing, IEEE 13, 3 (July 2014), 13–17.

27.

TU, X. M., KOWA LS K I, J ., ZH A NG , J., LY NCH , K. G., A ND CR I TS -CH RIS TO PH, P. Power analyses for longitudinal

trials and other clustered designs. Statistics in Medicine 23, 18 (2004), 2799–2815.

28.

WAN G, L ., ROTNITZKY, A., LIN, X., MILLIKAN, R. E., A ND THA LL, P. F. Evaluation of viable dynamic treatment

regimes in a sequentially randomized trial of advanced prostate cancer. Journal of the American Statistical

Association 107, 498 (2012), 493–508.

An Individualized, Data-Driven Digital Approach for Precision Behavior Change

Article

Apr 2019

Chronic disease now affects approximately half of the US population, causes 7 in 10 deaths, and accounts for roughly 80% of US health care expenditure. Because the root causes of chronic diseases are largely behavioral, effective therapies require frequent, individualized interventions that extend beyond the hospital and clinic to reach patients in their day-to-day lives. However, a mismatch currently exists between what the health care system is equipped to provide and the interventions necessary to effectively address the chronic disease burden. To remedy this health crisis, we present an individualized, data-driven digital approach for chronic disease management and prevention through precision behavior change. The rapid growth of information, biological, and communication technologies makes this an opportune time to develop digital tools that deliver precision interventions for health behavior change to address the chronic disease crisis. Building on this rapid growth, we propose a framework that includes the precise targeting of risk-producing behaviors using real-time sensing technology, machine learning data analysis to identify the most effective intervention, and delivery of that intervention with health-reinforcing feedback to provide real-time, individualized support to empower sustainable health behavior change.

Refining an algorithm-powered just-in-time adaptive weight control intervention: A randomized controlled trial evaluating model performance and behavioral outcomes

Article

Full-text available

Dec 2020
Health Informat J

Suboptimal weight losses are partially attributable to lapses from a prescribed diet. We developed an app (OnTrack) that uses ecological momentary assessment to measure dietary lapses and relevant lapse triggers and provides personalized intervention using machine learning. Initially, tension between user burden and complete data was resolved by presenting a subset of lapse trigger questions per ecological momentary assessment survey. However, this produced substantial missing data, which could reduce algorithm performance. We examined the effect of more questions per ecological momentary assessment survey on algorithm performance, app utilization, and behavioral outcomes. Participants with overweight/obesity ( n = 121) used a 10-week mobile weight loss program and were randomized to OnTrack-short (i.e. 8 questions/survey) or OnTrack-long (i.e. 17 questions/survey). Additional questions reduced ecological momentary assessment adherence; however, increased data completeness improved algorithm performance. There were no differences in perceived effectiveness, app utilization, or behavioral outcomes. Minimal differences in utilization and perceived effectiveness likely contributed to similar behavioral outcomes across various conditions.

Adaptive Systems for Internet-Delivered Psychological Treatments

Article

Full-text available

Jun 2020

Internet-Delivered Psychological Treatments (IDPT) are based on evidence-based psychological treatment models adjusted for interaction through the Internet. The use of Internet technologies has the potential to increase the availability of evidence-based mental health services for a far-reaching population with the use of fewer resources. Despite evidence that Internet Interventions can be effective means in mental health morbidities, most current IDPT systems are tunnel-based, inflexible, and non-interoperable. Hence it becomes essential to understand which elements of an Internet intervention contribute to effectiveness and treatment outcomes. By analogy, adaptation is a central aspect of successful face-to-face mental health therapy. Adaptability to patient needs can be regarded as an essential outcome factor in online systems for mental health interventions as well. While some aspects of rule-based and machine-learning-based adaptation have attracted attention in recent IDPT development, systematic reporting of core components, dimensions of adaptiveness, information architecture, and strategies for adaptation in the IDPT system are still lacking. To bridge this gap, we propose a model that shows how adaptive systems are represented in classical control theory and discuss how the model can be used to specify adaptive IDPT systems. Concerning the reference model, we outline the core components of adaptive IDPT systems, the main adaptive elements, dimensions of adaptiveness, information architecture applied to adaptive systems, and strategies used in the adaptation process. We also provide comprehensive guidelines on how to develop an adaptive IDPT system based on the Person-Based Approach.

Behavior science in the evolving world of digital health: considerations on anticipated opportunities and challenges

Article

Full-text available

Apr 2020

Digital health promises to increase intervention reach and effectiveness for a range of behavioral health outcomes. Behavioral scientists have a unique opportunity to infuse their expertise in all phases of a digital health intervention, from design to implementation. The aim of this study was to assess behavioral scientists’ interests and needs with respect to digital health endeavors, as well as gather expert insight into the role of behavioral science in the evolution of digital health. The study used a two-phased approach: (a) a survey of behavioral scientists’ current needs and interests with respect to digital health endeavors (n = 346); (b) a series of interviews with digital health stakeholders for their expert insight on the evolution of the health field (n = 15). In terms of current needs and interests, the large majority of surveyed behavioral scientists (77%) already participate in digital health projects, and from those who have not done so yet, the majority (65%) reported intending to do so in the future. In terms of the expected evolution of the digital health field, interviewed stakeholders anticipated a number of changes, from overall landscape changes through evolving models of reimbursement to more significant oversight and regulations. These findings provide a timely insight into behavioral scientists’ current needs, barriers, and attitudes toward the use of technology in health care and public health. Results might also highlight the areas where behavioral scientists can leverage their expertise to both enhance digital health’s potential to improve health, as well as to prevent the potential unintended consequences that can emerge from scaling the use of technology in health care.

Robust Estimation of Data-Dependent Causal Effects based on Observing a Single Time-Series

Preprint

Full-text available

Sep 2018

Consider the case that one observes a single time-series, where at each time t one observes a data record O(t) involving treatment nodes A(t), possible covariates L(t) and an outcome node Y(t). The data record at time t carries information for an (potentially causal) effect of the treatment A(t) on the outcome Y(t), in the context defined by a fixed dimensional summary measure Co(t). We are concerned with defining causal effects that can be consistently estimated, with valid inference, for sequentially randomized experiments without further assumptions. More generally, we consider the case when the (possibly causal) effects can be estimated in a double robust manner, analogue to double robust estimation of effects in the i.i.d. causal inference literature. We propose a general class of averages of conditional (context-specific) causal parameters that can be estimated in a double robust manner, therefore fully utilizing the sequential randomization. We propose a targeted maximum likelihood estimator (TMLE) of these causal parameters, and present a general theorem establishing the asymptotic consistency and normality of the TMLE. We extend our general framework to a number of typically studied causal target parameters, including a sequentially adaptive design within a single unit that learns the optimal treatment rule for the unit over time. Our work opens up robust statistical inference for causal questions based on observing a single time-series on a particular unit.

Using Ecological Momentary Assessment to Identify Common Smoking Situations Among Korean American Emerging Adults

Article

Full-text available

Aug 2016

The present study provides detailed contextual information about smoking habits among young Korean American smokers with the goal of characterizing situations where they are most at risk for smoking. Relevant situational factors included location, social context, concurrent activities, time of day, affective states, and food and beverage consumption. Using ecological momentary assessment (EMA) over 7 days, participants (N = 78) were instructed to respond to smoking prompts (n = 2614) and non-smoking prompts (n = 2136) randomly scheduled throughout the day. At each prompt, participants completed a short survey about immediate contextual factors. We used multilevel models to evaluate the association between contextual factors and smoking and further explored the distribution of smoking locations and concurrent activities across each social context and reason for smoking. Compared to non-smoking events, smoking events were associated with being outside, the presence of Korean friends, socializing, consuming alcohol, and experiencing more stress relative to one’s average stress level (all ps < .01). Further analyses involving only smoking events showed that when participants smoked alone, they were most commonly at home (50 %) and most often studying/working (28 %). When smoking with Korean friends, participants were most often outside (38 %) and socializing (54 %). When smoking to reduce craving, participants were most often at home (39 %) and studying/working (25 %). To our knowledge, this is the first study to provide detailed descriptions of real-time smoking contexts among young Korean American smokers. Information with this level of granularity is needed to develop effective just-in-time adaptive interventions (JITAIs) for smoking cessation.

Finding Significant Stress Episodes in a Discontinuous Time Series of Rapidly Varying Mobile Sensor Data

Conference Paper

Full-text available

May 2016

Management of daily stress can be greatly improved by delivering sensor-triggered just-in-time interventions (JITIs) on mobile devices. The success of such JITIs critically depends on being able to mine the time series of noisy sensor data to find the most opportune moments. In this paper, we propose a time series pattern mining method to detect significant stress episodes in a time series of discontinuous and rapidly varying stress data. We apply our model to 4 weeks of physiological, GPS, and activity data collected from 38 users in their natural environment to discover patterns of stress in real life. We find that the duration of a prior stress episode predicts the duration of the next stress episode and stress in mornings and evenings is lower than during the day. We then analyze the relationship between stress and objectively rated disorder in the surrounding neighborhood and develop a model to predict stressful episodes.

Towards personalized causal inference of medication response in mobile health: an instrumental variable approach for randomized trials with imperfect compliance

Article

Apr 2016

Mobile health studies can leverage longitudinal sensor data from smartphones to guide the application of personalized medical interventions. These studies are particularly appealing due to their ability to attract a large number of participants. In this paper, we argue that the adoption of an instrumental variable approach for randomized trials with imperfect compliance provides a natural framework for personalized causal inference of medication response in mobile health studies. Randomized treatment suggestions can be easily delivered to the study participants via electronic messages popping up on the smart-phone screen. Under quite general assumptions and as long as there is some degree of compliance between the randomized suggested treatment and the treatment effectively adopted by the study participant, we can identify the causal effect of the actual treatment on the response in the presence of unobserved confounders. We implement a personalized randomization test for testing the null hypothesis of no causal effect of the treatment on the response, and evaluate its performance in a large scale simulation study encompassing data generated from linear and non-linear time series models under several simulation conditions. In particular, we evaluate the empirical power of the proposed test under varying degrees of compliance between the suggested and actual treatment adopted by the participant. Our empirical investigations provide encouraging results in terms of power and control of type I error rates.

Assessing time-varying causal effect moderation in the presence of cluster-level treatment effect heterogeneity and interference

Article

Nov 2022

The micro-randomized trial is a sequential randomized experimental design to empirically evaluate the effectiveness of mobile health intervention components that may be delivered at hundreds or thousands of decision points. Micro-randomized trials have motivated a new class of causal estimands, termed causal excursion effects, for which semiparametric inference can be conducted via a weighted, centred least-squares criterion (Boruvka et al., 2018). Causal excursion effects allow health scientists to answer important scientific questions about how intervention effectiveness may change over time or may be moderated by individual characteristics, time-varying context or past responses. Existing definitions and associated methods assume between-subject independence and noninterference. Deviations from these assumptions often occur. In this paper, causal excursion effects are revisited under potential cluster-level treatment effect heterogeneity and interference, where the treatment effect of interest may depend on cluster-level moderators. Utility of the proposed methods is shown by analysing data from a multi-institution cohort of first-year medical residents in the United States.

Medication Adherence and Monitoring

Chapter

Oct 2016

Non-adherence to a drug therapy is often the reason for not achieving the therapeutic goals in patients. Thus, measuring and monitoring drug adherence is an important aspect to understand patients’ adherence patterns and behavior as well as to provide supportive measures to enhance or reestablish adherence to a prescribed regimen. A variety of different Adherence Measurement and Monitoring Systems (AMS) exist although there is no single AMS or method considered to be the gold standard today. These range from simple Apps that issue alerts and reminders to patients up to AMS that facilitate automated, telemedical interactions between the physician and the patient to initiate corrective interventions by making use of a variety of data sources. When applied to patients with several morbidities, co-morbidities, and disabilities appropriate AMS still remain a challenge.

Small Sample Performance of Bias-corrected Sandwich Estimators for Cluster-Randomized Trials with Binary Outcomes

Article

Full-text available

Jan 2015
STAT MED

The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster-randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias-corrected sandwich estimators. Our results suggest that the GEE Wald z-test should be avoided in the analyses of CRTs with few clusters even when bias-corrected sandwich estimators are used. With t-distribution approximation, the Kauermann and Carroll (KC)-correction can keep the test size to nominal levels even when the number of clusters is as low as 10 and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)-correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t-test and KC-correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes because of fewer assumptions and robustness to the misspecification of the covariance structure. Copyright © 2014 John Wiley & Sons, Ltd.

Integrating addiction treatment into primary care using mobile health technology: Protocol for an implementation research study

Article

Full-text available

May 2014
IMPLEMENT SCI

Background Healthcare reform in the United States is encouraging Federally Qualified Health Centers and other primary-care practices to integrate treatment for addiction and other behavioral health conditions into their practices. The potential of mobile health technologies to manage addiction and comorbidities such as HIV in these settings is substantial but largely untested. This paper describes a protocol to evaluate the implementation of an E-Health integrated communication technology delivered via mobile phones, called Seva, into primary-care settings. Seva is an evidence-based system of addiction treatment and recovery support for patients and real-time caseload monitoring for clinicians. Methods/Design Our implementation strategy uses three models of organizational change: the Program Planning Model to promote acceptance and sustainability, the NIATx quality improvement model to create a welcoming environment for change, and Rogers’s diffusion of innovations research, which facilitates adaptations of innovations to maximize their adoption potential. We will implement Seva and conduct an intensive, mixed-methods assessment at three diverse Federally Qualified Healthcare Centers in the United States. Our non-concurrent multiple-baseline design includes three periods — pretest (ending in four months of implementation preparation), active Seva implementation, and maintenance — with implementation staggered at six-month intervals across sites. The first site will serve as a pilot clinic. We will track the timing of intervention elements and assess study outcomes within each dimension of the Reach, Effectiveness, Adoption, Implementation, and Maintenance framework, including effects on clinicians, patients, and practices. Our mixed-methods approach will include quantitative (e.g., interrupted time-series analysis of treatment attendance, with clinics as the unit of analysis) and qualitative (e.g., staff interviews regarding adaptations to implementation protocol) methods, and assessment of implementation costs. Discussion If implementation is successful, the field will have a proven technology that helps Federally Qualified Health Centers and affiliated organizations provide addiction treatment and recovery support, as well as a proven strategy for implementing the technology. Seva also has the potential to improve core elements of addiction treatment, such as referral and treatment processes. A mobile technology for addiction treatment and accompanying implementation model could provide a cost-effective means to improve the lives of patients with drug and alcohol problems. Trial registration ClinicalTrials.gov (NCT01963234).

A Smartphone Application to Support Recovery From Alcoholism A Randomized Clinical Trial

Article

Full-text available

Mar 2014

Importance Patients leaving residential treatment for alcohol use disorders are not typically offered evidence-based continuing care, although research suggests that continuing care is associated with better outcomes. A smartphone-based application could provide effective continuing care.Objective To determine whether patients leaving residential treatment for alcohol use disorders with a smartphone application to support recovery have fewer risky drinking days than control patients.Design, Setting, and Participants An unmasked randomized clinical trial involving 3 residential programs operated by 1 nonprofit treatment organization in the Midwestern United States and 2 residential programs operated by 1 nonprofit organization in the Northeastern United States. In total, 349 patients who met the criteria for DSM-IV alcohol dependence when they entered residential treatment were randomized to treatment as usual (n = 179) or treatment as usual plus a smartphone (n = 170) with the Addiction–Comprehensive Health Enhancement Support System (A-CHESS), an application designed to improve continuing care for alcohol use disorders.Interventions Treatment as usual varied across programs; none offered patients coordinated continuing care after discharge. A-CHESS provides monitoring, information, communication, and support services to patients, including ways for patients and counselors to stay in contact. The intervention and follow-up period lasted 8 and 4 months, respectively.Main Outcomes and Measures Risky drinking days—the number of days during which a patient’s drinking in a 2-hour period exceeded 4 standard drinks for men and 3 standard drinks for women, with standard drink defined as one that contains roughly 14 g of pure alcohol (12 oz of regular beer, 5 oz of wine, or 1.5 oz of distilled spirits). Patients were asked to report their risky drinking days in the previous 30 days on surveys taken 4, 8, and 12 months after discharge from residential treatment.Results For the 8 months of the intervention and 4 months of follow-up, patients in the A-CHESS group reported significantly fewer risky drinking days than did patients in the control group, with a mean of 1.39 vs 2.75 days (mean difference, 1.37; 95% CI, 0.46-2.27; P = .003).Conclusions and Relevance The findings suggest that a multifeatured smartphone application may have significant benefit to patients in continuing care for alcohol use disorders.Trial Registration clinicaltrials.gov Identifier: NCT01003119

Mobile Health: Revolutionizing Healthcare Through Transdisciplinary Research

Article

Full-text available

Jan 2013

Mobile health (mHealth) seeks to improve individuals' health and well-being by continuously monitoring their status, rapidly diagnosing medical conditions, recognizing behaviors, and delivering just-in-time interventions, all in the user's natural mobile environment. The Web extra at http://youtu.be/o2mieSywutY is an audio interview in which Santosh Kumar, Wendy Nilsen, and Mani Srivastava discuss the path toward realizing mobile health systems.

The Effectiveness of Mobile-Health Technology-Based Health Behaviour Change or Disease Management Interventions for Health Care Consumers: A Systematic Review

Article

Full-text available

Jan 2013
PLOS MED

Mobile technologies could be a powerful media for providing individual level support to health care consumers. We conducted a systematic review to assess the effectiveness of mobile technology interventions delivered to health care consumers. We searched for all controlled trials of mobile technology-based health interventions delivered to health care consumers using MEDLINE, EMBASE, PsycINFO, Global Health, Web of Science, Cochrane Library, UK NHS HTA (Jan 1990-Sept 2010). Two authors extracted data on allocation concealment, allocation sequence, blinding, completeness of follow-up, and measures of effect. We calculated effect estimates and used random effects meta-analysis. We identified 75 trials. Fifty-nine trials investigated the use of mobile technologies to improve disease management and 26 trials investigated their use to change health behaviours. Nearly all trials were conducted in high-income countries. Four trials had a low risk of bias. Two trials of disease management had low risk of bias; in one, antiretroviral (ART) adherence, use of text messages reduced high viral load (>400 copies), with a relative risk (RR) of 0.85 (95% CI 0.72-0.99), but no statistically significant benefit on mortality (RR 0.79 [95% CI 0.47-1.32]). In a second, a PDA based intervention increased scores for perceived self care agency in lung transplant patients. Two trials of health behaviour management had low risk of bias. The pooled effect of text messaging smoking cessation support on biochemically verified smoking cessation was (RR 2.16 [95% CI 1.77-2.62]). Interventions for other conditions showed suggestive benefits in some cases, but the results were not consistent. No evidence of publication bias was demonstrated on visual or statistical examination of the funnel plots for either disease management or health behaviours. To address the limitation of the older search, we also reviewed more recent literature. Text messaging interventions increased adherence to ART and smoking cessation and should be considered for inclusion in services. Although there is suggestive evidence of benefit in some other areas, high quality adequately powered trials of optimised interventions are required to evaluate effects on objective outcomes. Please see later in the article for the Editors' Summary.

Longitudinal data analysis using generalized linear models

Article

Jan 1986
BIOMETRIKA

Dynamic Models of Behavior for Just-in-Time Adaptive Interventions

Article

Jul 2014

A new approach to causal inference in mortality studies with sustained exposure periods - Application to control of the healthy worker survivor effect

Article

Dec 1987
Math Model

James M Robins

In observational cohort mortality studies with prolonged periods of exposure to the agent under study, it is not uncommon for risk factors for death to be determinants of subsequent exposure. For instance, in occupational mortality studies date of termination of employment is both a determinant of future exposure (since terminated individuals receive no further exposure) and an independent risk factor for death (since disabled individuals tend to leave employment). When current risk factor status determines subsequent exposure and is determined by previous exposure, standard analyses that estimate age-specific mortality rates as a function of cumulative exposure may underestimate the true effect of exposure on mortality whether or not one adjusts for the risk factor in the analysis. This observation raises the question, which if any population parameters can be given a causal interpretation in observational mortality studies?In answer, we offer a graphical approach to the identification and computation of causal parameters in mortality studies with sustained exposure periods. This approach is shown to be equivalent to an approach in which the observational study is identified with a hypothetical double-blind randomized trial in which data on each subject's assigned treatment protocol has been erased from the data file. Causal inferences can then be made by comparing mortality as a function of treatment protocol, since, in a double-blind randomized trial missing data on treatment protocol, the association of mortality with treatment protocol can still be estimated.We reanalyze the mortality experience of a cohort of arsenic-exposed copper smelter workers with our method and compare our results with those obtained using standard methods. We find an adverse effect of arsenic exposure on all-cause and lung cancer mortality which standard methods fail to detect.

Longitudinal Data Analysis Using General Linear Models

Article

Apr 1986
BIOMETRIKA

This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for niultivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the pioposecl estimators in two simple situations is considered. The approach is closely related to quasi-likelihood.

Behavioral Impacts of Sequentially versus Simultaneously Delivered Dietary Plus Physical Activity Interventions: the CALM Trial

Article

Apr 2013
ANN BEHAV MED

Background Few studies have evaluated how to combine dietary and physical activity (PA) interventions to enhance adherence. Purpose We tested how sequential versus simultaneous diet plus PA interventions affected behavior changes. Methods Two hundred participants over age 44 years not meeting national PA and dietary recommendations (daily fruit and vegetable servings and percent of calories from saturated fat) were randomized to one of four 12-month telephone interventions: sequential (exercise first or diet first), simultaneous, or attention control. At 4 months, the other health behavior was added in the sequential arms. Results Ninety-three percent of participants were retained through 12 months. At 4 months, only exercise first improved PA, and only the simultaneous and diet-first interventions improved dietary variables. At 12 months, mean levels of all behaviors in the simultaneous arm met recommendations, though not in the exercise- and diet-first arms. Conclusions We observed a possible behavioral suppression effect of early dietary intervention on PA that merits investigation.

Micro-Randomized Trials in mHealth

Abstract and Figures

Recommended publications

Sample size calculations for micro-randomized trials in mHealth

Micro‐Randomized Trial

Just-In-Time Adaptive Interventions: Experiment, Inference and Online Learning

Multi-Level Micro-Randomized Trial: Detecting the Proximal Effect of Messages on Physical Activity