Assessing Clients in Their Natural Environments With Electronic Diaries:
Rationale, Benefits, Limitations, and Barriers
Thomas M. Piasecki
University of Missouri—Columbia
Michael R. Hufford
Marika Solhan and Timothy J. Trull
University of Missouri—Columbia
Increasingly, mobile technologies are used to gather diary data in basic research and clinical studies. This
article considers issues relevant to the integration of electronic diary (ED) methods in clinical assessment.
EDs can be used to gather rich information regarding clients’ day-to-day experiences, aiding diagnosis,
treatment planning, treatment implementation, and treatment evaluation. The authors review the benefits
of using diary methods in addition to retrospective assessments, and they review studies assessing
whether EDs yield higher quality data than conventional, less expensive paper–pencil diaries. Practical
considerations—including what platforms can be used to implement EDs, what features they should
have, and considerations in designing diary protocols for sampling different types of clinical phenom-
ena—are described. The authors briefly illustrate with examples some ways in which ED data could be
summarized for clinical use. Finally, the authors consider barriers to clinical adoption of EDs. EDs are
likely to become increasingly popular tools in routine clinical assessment as clinicians become more
familiar with the logic of diary designs; as software packages evolve to meet the needs of clinicians; and
as mobile technologies become ubiquitous, robust, and inexpensive.
Keywords: electronic diary, ecological momentary assessment, experience sampling, clinical assessment,
The goal of clinical assessment is to provide a better under-
standing of the client, including a conceptualization of the problem
at hand, a prescription for treatment, and an evaluation of the
effects and effectiveness of interventions. Thus, clinical assess-
ment is not simply an exercise to be performed when a client first
presents for treatment. Rather, it is an ongoing process that informs
the clinician (and client) throughout the treatment. To be sure, the
initial diagnostic formulation is important. However, much is to be
gained by expanding the timeframe of clinical assessment to
provide a better sample of a client’s behavior (over time and over
situations), to allow for modification of the conceptualization of
the target problems, and to track improvement. Diary assess-
ments—asking clients to monitor and record important states,
events, and behaviors between treatment sessions—are uniquely
well suited to these goals.
In basic and clinical research, use of mobile technologies to
gather diary data has grown in recent years. Mounting evidence
suggests that electronic diaries (EDs) and more traditional assess-
ment techniques, such as retrospective self-report and interview
measures, often yield qualitatively different information. To the
extent that they contribute a novel perspective, EDs have the
potential to enhance the clinical assessment enterprise and deepen
clinicians’ understanding of their clients’ symptoms, motives, and
life circumstances. Practicing psychologists have not yet embraced
EDs as part of routine clinical assessment, but it is likely that many
will grow interested in these methods as their potential benefits
become more widely appreciated, as techniques for implementing
EDs become more familiar, and as mobile devices become more
ubiquitous and inexpensive.
Commensurate with the goals of this special section, the current
article provides a comparison between ED assessment and other
assessment approaches, a description of the potential use of EDs in
clinical assessment, and a discussion of the challenges and limi-
tations inherent to this approach. In addition, we give examples of
clinical phenomena and behavioral targets that may be particularly
amenable to ED assessment methods. Because many of the fun-
damental issues in diary and ED assessment have arisen in re-
search contexts, many of the issues reviewed here unavoidably
Gregory T. Smith served as Action Editor for this manu-
Thomas M. Piasecki, Marika Solhan, and Timothy J. Trull, Department
of Psychological Sciences, University of Missouri—Columbia; Michael R.
Hufford, invivodata, inc., Pittsburgh, PA.
Michael R. Hufford is now at Amylin Pharmaceuticals, Inc., San Diego,
CA. He retains a financial interest in invivodata, inc., a company that
provides ED and EMA support for clinical trials.
Preparation of this article was supported in part by National Institute on
Drug Abuse Grant DA016330, National Institute of Mental Health Grant
MH069472, and National Institute on Alcohol Abuse and Alcoholism
Correspondence concerning this article should be addressed to Thomas
M. Piasecki, Department of Psychological Sciences, 210 McAlester Hall,
University of Missouri—Columbia, Columbia, MO 65211. E-mail:
2007, Vol. 19, No. 1, 25–43
Copyright 2007 by the American Psychological Association
overlap with topics covered in recent reviews intended for basic
researchers (Bolger, Davis, & Rafaeli, 2003; Conner Christensen,
Feldman Barrett, Bliss-Moreau, Lebo, & Kaschub, 2003; Hufford,
in press; Scollon, Kim-Prieto, & Diener, 2003; Stone & Shiffman,
2002). Throughout, however, we attempt to consider these issues
in light of the challenges facing a typical clinician in daily practice.
We assume that this special section will be read by clinicians
interested in innovative assessment techniques but varying in their
familiarity with diary methods. Therefore, we begin with an over-
view of some of the foibles of retrospection, the unique potential
of diary methods in assessment, and evidence suggesting that EDs
can enhance the quality of collected diary data. In the second half
of the article, we consider in greater detail practical issues related
to designing and implementing ED assessments as well as means
of using the data and of integrating them into clinical practice.
Diaries and EDs: Rationale, Benefits, and Limitations
Complexity of Seemingly Simple Retrospective Self-
A busy practitioner might wonder why diaries of any kind
should be considered as part of the assessment routine. Why not
simply “cut to the chase,” asking directly about past events, states,
or behaviors of greatest interest with questionnaire measures or
interview questions? Given their tremendous expediency, simple
retrospective self-reports are unlikely ever to be replaced as the
workhorses of routine clinical assessment. However, a large body
of literature from cognitive science has highlighted the complex
processes involved in formulating responses to retrospective ques-
tions. This research has suggested there can often be a gulf
between “evaluators’ hopes and participants’ reality” (Schwarz &
Oyserman, 2001, p. 128) and has provided a rationale for collec-
tion of diary data in many settings.
Research has suggested that answering even a simple, factual
question (e.g., “How many alcoholic beverages did you consume
last week?”) involves numerous cognitive processes, each of
which may cause the respondent’s answer to deviate from the
objectively true answer (Bradburn, 2000; Broderick, Stone, Calv-
anese, Schwartz, & Turk, 2006; Schwarz & Oyserman, 2001).
Respondents do not typically retrieve all instances of a behavior
from memory and then count them up. Failure to do so seems to
reflect real limitations in the structure of memory rather than poor
motivation; unless the behavior in question is especially salient or
infrequent, respondents simply may not be capable of a recall-
then-count strategy (Hammersley, 1994; Schwarz & Oyserman,
2001). Questions about common but variable states (e.g., psychi-
atric symptoms, mood) require the respondent to subjectively parse
then weight and integrate a stream of experience, and this task is
even more demanding than asking a respondent to recall and count
discrete events. For both types of questions, respondents must rely
on heuristic strategies to estimate the best answer.
A number of factors may affect this estimation process. Re-
sponse scales provided by the investigator shape expectations
about how much of a behavior is a lot; as a result, changing the
breadth and ordering of response categories can systematically
affect the answers obtained (Schwarz, 1990). Current mood states
and contextual cues (including the content of surrounding ques-
tionnaire items) may affect the availability of exemplars for re-
trieval from memory and affect response estimates (Kihlstrom,
Eich, Sandbrand, & Tobias, 2000; Menon & Yorkston, 2000;
Robinson & Clore, 2002). Frequency estimates for discrete events
tend to cluster about round numbers (e.g., reporting 20 cigarettes
per day). Dates of past events are often recalled as having hap-
pened more recently than they really did (telescoping, Bradburn,
2000) and are often influenced by calendar factors (e.g., multiples
of 7 days). Recall of conditions preceding a salient event (i.e.,
relapse to drug use, arrest, divorce) may be systematically dis-
torted by knowing the outcome. That is, respondents may (unwit-
tingly) construct a revisionist history to explain what happened
(e.g., “I must have really felt badly to do such a thing.”). When
searching episodic memory, individual respondents are likely to
differ in the kinds of data they consider and the methods they use
to integrate retrieved information (Broderick et al., 2006). For
some questions (e.g., about distant, mundane, or hypothetical
events), searching episodic memory may provide little useful in-
formation for constructing a response. In these instances, respon-
dents may rely on naı ¨ve theories of human nature or salient aspects
of their self-concepts to generate answers (e.g., Conner Chris-
tensen, Wood, & Feldman Barrett, 2003; Robinson & Clore,
In sum, the apparent simplicity of retrospective questions is
often deceptive. When respondents generate answers to these
questions, they engage in complex processing and are typically
estimating an answer rather than providing a direct account of
factual information retrieved easily and accurately from memory.
In any given context, it may be difficult to know the net effect of
the estimation process on the verisimilitude of the responses.
It should be noted that the existence of these complex influences
on self-report does not mean that retrospective data are hopelessly
and inherently flawed or that they bear no statistical relation to
“objective” states or events. To be sure, much of the accumulated
knowledge in the behavioral sciences has arisen from the use of
such data, and there is often considerable overlap between respon-
dents’ recalled and real-time ratings of experience (e.g., Stone,
Broderick, Shiffman, & Schwartz, 2004). Still, to the extent recall
is limited, methods minimizing the need for recall have the poten-
tial to yield a different perspective on events of interest. Diary
methods represent one strategy for gaining an understanding of
respondents’ experiences in their natural environments (Bolger et
Diary Methods: Gaining an Unfiltered Perspective
Diary data have long been used in basic research (Scollon et al.,
2003) and have been an important component of clinical assess-
ment in some therapeutic traditions (e.g., self-monitoring in be-
havior therapy; Kanfer, 1970; McFall & Hammen, 1971; O’Hara
& Rehm, 1979). The cardinal strategic advantage of diary data is
that, by asking respondents to record their experiences at or near
the time at which they happen, the influence of processes unique
to recall is minimized. Several secondary benefits flow directly
from this property of the method. Under ideal conditions, accurate
dates and times of important events should be evident in the data
record. Subsequent events cannot influence ratings because the
respondent cannot know this information at the time of recording.
The assessor may manipulate the diary records to construct sum-
maries of experience that are as quantitatively precise and consis-
PIASECKI, HUFFORD, SOLHAN, AND TRULL
tent (within or between respondents) as the underlying diary data
will permit. Counts of behaviors can be made directly, avoiding
the influence of rounding error. Basic summary statistics, such as
means of subjective states, will weight ratings equally, providing
estimates that are not unduly influenced by salient but rare mo-
ments of extreme experience. The assessor may conduct data
analyses to ask questions of the diary records that would be
difficult or impossible to ask respondents to estimate retrospec-
tively (e.g., “Did the intraday correlation between work stressors
and your depression symptoms change after you started your
Several recent diary studies have demonstrated notable discrep-
ancies between near-real-time reporting of experiences and recall
of the same events, including clinically crucial quantities such as
baseline symptomatology and symptomatic change (Stone, Brod-
erick, Shiffman, & Schwarz, 2004; Williams et al., 2004).
A detailed and clinically relevant investigation by Shiffman et
al. (1997) provides a useful exemplar. Shiffman et al. (1997)
administered a retrospective questionnaire concerning characteris-
tics of the first lapse to smoking to 127 smokers who had experi-
enced a lapse during a cessation attempt 12 weeks earlier. These
smokers carried EDs during the cessation attempt, providing an
opportunity to compare near-real-time and recalled versions of the
same (presumably salient) behavioral event. Results revealed that
participants could not accurately date their first lapse; only 23%
provided the correct day, and the average error in dating was 14.5
days. Recall for situations and subjective states surrounding the
first lapse also tended to be poor. Retrospective estimates of
prelapse mood state, the domain in which smokers were most
accurate, correlated only .36 with diary reports. Participants who
smoked more since the lapse tended to exaggerate the impact of
the first lapse on their motivation to quit smoking, suggesting that
the outcome of the lapse (i.e., a return to heavy smoking) colored
retrospective evaluations of the lapse event. In general, retrospec-
tive reports showed a bias toward higher prelapse negative affect
ratings relative to the real-time reports. It is interesting to note that
although lapsers were not highly accurate in recounting the cir-
cumstances of their own first lapse, their retrospective estimates
corresponded fairly well to the modal pattern of lapse antecedents,
perhaps suggesting that general beliefs about quitting and relapse
influenced retrospective reports. Indeed, when a group of “never-
smokers” was asked to complete the same lapse questionnaire with
reference to expectations for a typical smokers’ relapse, their
responses were highly correlated with lapsers’ (often inaccurate)
retrospective accounts of their own lapses.
Do discrepancies of this magnitude matter, clinically speaking?
We believe so. In many forms of therapy for substance abuse,
clinicians interview patients about the circumstances surrounding
failed quit attempts and focus therapy around potent triggering
events identified through this process (e.g., Niaura et al., 1999;
Witkiewitz & Marlatt, 2004). If the client is unable to accurately
identify the antecedents of prior lapses, one can question the value
of the therapeutic decisions based on the interview data. Although
this example focuses on smoking behavior, the issue is clearly a
general one in clinical practice—whenever one uses interview
information to structure treatment, there is the potential for both
random error and systematic bias to impact treatment planning.
Diary data have the potential to improve practice if they reveal
evidence of functional relations that serve to maintain symptoms
(e.g., Haynes, Leisen, & Blaine, 1997) but that are not evident in
recall because they are inscrutable to or incompletely processed by
the client. Basic research using diary designs is increasingly dem-
onstrating discrepancies between monitored experience and pa-
tients’ beliefs about experience (and perhaps the traditional clinical
wisdom built on those beliefs). For example, although smokers
strongly believe they smoke to reduce negative affect, there is little
relation between ongoing smoking and affect ratings in diary
records (Shiffman, Gwaltney, et al., 2002). Despite a gloomy
outlook, depressed patients do not report more negative events
than nondepressed persons in diary records. Moreover, they report
smaller reactions to negative events and larger reactions to positive
events than do nondepressed persons (Peeters, Nicolson, Berkhof,
Delespaul, & de Vries, 2003).
This does not mean recalled experience solicited by interview
should be disregarded. Indeed, practicing clinicians are very likely
to be interested both in what “really” happens in their patients’
environments and emotional lives and in how their clients retro-
spectively appraise the same events. Discrepancies between the
two kinds of assessments have the potential to add value to the
therapeutic process. For example, such discrepancies could reveal
(and be instrumental in helping to change) characteristic cognitive
distortions of experience, considered to be important components
of risk for psychopathology and key targets for intervention in
many clinical traditions (e.g., Beck, 2005; Moses & Barlow,
2006). In this regard, EDs can allow assessors to concretely test
their implicit theories regarding the real-time relations among
maladaptive thoughts, feelings, and behaviors, which are thought
to underlie many forms of psychopathology.
Limitations of Diaries
As Hammersley (1994) has noted, “There will never be a
philosophers’ stone which will convert self-reported data into
absolutely accurate figures of quantity, frequency, and timing” (p.
283). It is important to recognize that diary data are not immune
from nuisance variance or error. Instead, diary data have a unique
profile of vulnerabilities relative to retrospective reports.
Because diary data consist of self-reports, they will be subject to
many of the same processes that characterize retrospective self-
report—in principle, they should be subject to any influence not
attributable to the recall process (e.g., influences of surrounding
item content or response scales). Additionally, diary designs often
require at least “microretrospection” over shorter time scales; they
may reduce, but not eliminate, potential retrospective biases.
Statistical summaries of diary records allow assessors to scru-
tinize respondents’ experience in ways that are ostensibly rational,
objective, consistent, and unbiased. There is a temptation to inter-
pret discrepancies between recalled experience and statistical sum-
maries of diary data as revealing error in the former. It may be
more appropriate to regard retrospective and real-time data as
being qualitatively different from one another. The kind of data to
be preferred will depend on the assessment problem. For example,
if the goal of assessment is to predict a client’s future behavioral
choices, retrospective assessment may be preferable to aggregated
data assessed in real time (e.g., Redelmeier & Kahneman, 1996;
Wirtz, Kruger, Scollon, & Diener, 2003). This is because people
generally must rely on their heuristically processed reflections to
inform their future choices in real-world decision making.
SPECIAL SECTION: INNOVATIVE ASSESSMENT METHODS
One of the key problems with retrospective self-reporting is that
the respondent’s episodic memory is limited—it may not contain
the critical data of interest to the assessor, or it may not be
accessible to the respondent in a form that can easily be searched
and aggregated. An analogous problem can easily arise in a poorly
conceived diary assessment. Diary protocols require that the as-
sessor make informed decisions about the frequency, timing, and
content of diary reports. If the diary protocol does not capture
important aspects of the domain to be assessed (i.e., the diary asks
the wrong questions or misses key states and events), then the
value of the collected diary data may be seriously limited.
The repeated measures nature of diary assessments entails re-
spondent burden. The diary investigation attempts to replace one
type of effortful process (a one-time search of episodic memory
and estimation of responses) with another (ostensibly simpler
accounts of recent experience provided repeatedly). The validity of
diary data may be severely threatened if respondents cannot shoul-
der the repeated measures burden and deviate from the diary
protocol. This issue, to which we now turn, has received extensive
scrutiny in the diary literature.
Situational constraints, lapses of motivation, and simple forget-
ting are facts of life that will lead to missing diary reports in any
extended diary protocol. Unfortunately, missed diary entries are
not always recorded as missing values in the data record. A major
limitation of conventional paper–pencil diaries is that they provide
no means for ensuring that entries are made in a timely fashion.
Respondents may backfill paper diary entries, sometimes even
completing large swaths of diary pages immediately before they
are to be returned to the assessor. The assessor generally has no
reliable way to detect the presence of backfilling or flagging the
Backfilling might be driven by several factors, including a
desire to avoid embarrassment over incomplete entries or misun-
derstanding of the diary instructions. Respondents may sincerely
believe that backfilled data are accurate or at least of greater value
to the assessor than none at all. Even assuming backfilling is
always performed carefully and in good faith, backfilled data
clearly present a problem for assessors: They defeat the purpose of
using diary data to gain a recall-minimizing perspective on daily
EDs may be programmed so as to permit greater certainty
regarding the timing of diary entries as well as enhanced data
legibility. Compliance-enhancing features arguably provide the
strongest rationale for using EDs in favor of traditional paper
diaries. The extent of paper diary backfilling and its effects on
diary data validity are currently a matter of some controversy in
the ED literature (cf. Broderick & Stone, 2006; Green, Rafaeli,
Bolger, Shrout, & Reis, 2006; Tennen, Affleck, Coyne, Larsen, &
DeLongis, 2006). Because this is an active and unsettled area of
research assessing the incremental benefits of EDs over their paper
diary counterparts, we discuss recent, influential investigations of
this question in some detail.
Assessing Diary Noncompliance
Stone, Shiffman, Schwartz, Broderick, & Hufford (2002) de-
vised a novel paper diary to permit an unobtrusive assessment of
backfilling. Forty patients with chronic pain were asked to com-
plete pain ratings on paper diary cards at three fixed times per day
(10 am, 4 pm, and 8 pm) over a period of 3 weeks. Unbeknownst
to the patients, the paper diaries had been fitted with an unobtru-
sive photosensor hidden in the binder of the diary that detected
every time the diary was opened and closed and that then recorded
a time and date stamp of these activities to a thin computer chip
sewn into the binder of the diaries. Diary cards were permanently
affixed to the binder with plastic rings so that compliant recording
required opening the diary.
Patients completed most of the paper diary cards (90%), creating
the appearance of acceptable compliance. However, the photocell
recordings were illuminating. The binder was opened within 30
min of the scheduled assessment time on only 11% of occasions
(estimated compliance improved to only about 20% when using a
more liberal 90-min interval). The data suggested that much back-
filling was performed long after the scheduled assessment. Photo-
cell records indicated that diaries were never opened on 32% of all
study days. The diaries patients turned in contained completed
diary cards for 92% of the scheduled assessments from these same
days. Such “diary hoarding” was not limited to a small number of
participants; a full 75% of patients turned in completed cards for
days the paper diary had not been opened. Noncompliance was not
limited to backfilling; 45% of participants actually forward-filled
the diary at least once—that is, telling the diary today how much
pain that they were going to be in tomorrow (Stone, Shiffman,
Schwartz, Broderick, & Hufford, 2003).
Merely making diaries “electronic” by porting questionnaires to
handheld computing devices is not sufficient to prevent backfilling
(e.g., Hyland, Kenyon, Allen, & Howarth, 1993). However, elec-
tronic devices permit the use of interactive features that can
encourage real-time compliance. Signaling (i.e., delivering an au-
dible beep when assessments are due) is relatively easy to imple-
ment with some ED platforms and may prompt timely entries that
respondents might otherwise forget to make. Software can be
programmed to prohibit backfilling by permitting respondents to
log entries only during an investigator-designated interval after a
discrete point in time or after a signaled prompt. When respondents
have helpful reminders and know they cannot backfill entries (i.e.,
the respondent knows that if she or he misses a prompt, the
assessor will eventually find out), compliance with desired record-
ing improves. A comparison group in the Stone et al. (2002)
photocell study carried palmtop computer EDs with these features
instead of paper diaries. These patients completed 94% of the diary
entries within 30 min of the scheduled diary entry time point, with
no possibility that the electronic records could have been back- or
forward-filled. We are not aware of any other studies in which
paper diary compliance was monitored objectively and unobtru-
sively. However, several other studies using clinical populations
have found that substantially more diary records are completed in
paper diaries than are completed with EDs that prevent out-of-
window responses, a pattern consistent with backfilling of paper
diaries (Gaertner, Elsner, Pollmann-Dahmen, Radbruch, & Saba-
towski, 2004; Lauritsen et al., 2004; Nyholm, Kowalski, & Aq-
Software-based prevention of backfilling (and consequent re-
moval of temptation) may be the most important compliance-
promoting feature of EDs. Broderick, Schwartz, Shiffman, Huf-
ford, and Stone (2003) conducted a follow-up to the Stone et al.
PIASECKI, HUFFORD, SOLHAN, AND TRULL
(2002) photocell study designed to test the incremental improve-
ment in compliance associated with adding audible prompts to the
original paper diary protocol. The investigators recruited another
27 pain patients by using identical procedures, essentially creating
a third arm to the Stone et al. (2002) study. Patients were asked to
make pain ratings in the photocell-equipped paper diaries on the
same schedule as before. However, this time they were provided
with programmable wristwatches that beeped for 60 s and dis-
played scrolling reminder text at each scheduled assessment time.
As before, the diary cards turned in to the investigators suggested
good compliance (85% of cards were claimed to have been com-
pleted within 30 min of scheduled time), but actual compliance
based on photocell recordings was poor (29%), and a high pro-
portion of returned diary cards (99%) from days on which the diary
was not opened were completed. A direct test of the signaled
versus unsignaled groups revealed that signaling was associated
with significantly higher rates of verifiable compliance (29% vs.
11%), but signaled paper diaries still produced less verifiable
compliance relative to the ED. Despite the high rates of detectable
backfilling, most participants (82%) rated themselves as very or
extremely successful at adhering to the diary protocol on a ques-
tionnaire measure of compliance completed after turning in the
diaries. It is interesting to note that there was a significant negative
correlation (r ? –.43) between photocell verified compliance and
self-reported success at compliance. This effect might suggest
deception on the part of noncompliant participants, underestima-
tion of compliance by conscientious patients who did not meet
their own high expectations for study adherence, or a mix of both
These studies have been highly influential but have also stirred
debate (Green et al., 2006; Tennen et al., 2006). Critics have
charged that the photocell studies compared best-practice EDs
with typical-practice paper diaries, a comparison that is likely to
build in advantages for the ED. Commentators have noted that ED
participants in the Stone et al. (2002) study understood their
compliance would be monitored and could not be faked. In con-
trast, paper diary patients did not know their compliance was being
monitored and were given no feedback on their compliance rates.
This might have tacitly encouraged noncompliance (Green et al.,
2006). Broderick and Stone (2006) have cautioned that the results
of their studies should not be generalized to all paper diary re-
search, and they acknowledged that parametric studies are needed
to understand the factors that contribute to noncompliance levels in
paper diary assessments.
Green et al. (2006) recently reported a series of studies evalu-
ating compliance with paper diaries and comparing data quality
across paper diaries and EDs. In one study, reanalyses were
reported from a sample of 42 participants asked to complete paper
diary reports 10 times per day according to a random time sam-
pling schedule. Participants were given preprogrammed wrist-
watches that signaled when diary entries were to be made. Diary
forms asked participants to record the actual time the diary entry
was made. The investigators compared the recorded entry times
against the signaling schedule to estimate compliance with the
protocol. Respondents completed 66.4% of the scheduled diaries.
When compliance was defined as entering a response time falling
in a window spanning 5 min before the scheduled signal to 15 min
after the signal, only 9.9% of the entered responses were noncom-
pliant. The rate of out-of-window compliance was reduced to 4.4%
after eliminating 3 participants with pervasive noncompliance.
Because the timing of prompts was randomized, the authors con-
tended it would be implausible for the recorded diary times to
match the signal deliveries if respondents did not actually comply
with the diary instructions.
In a second study, Green et al. (2006) reanalyzed data from two
samples of participants asked to record their moods at 3-hr inter-
vals over a 1-week period. Sixty-two participants completed the
protocol by using paper diaries, writing down the date and time of
their entries on the diary form. Ninety-six participants used EDs to
record the data, and the devices recorded the dates and times of
entries automatically. ED participants provided slightly more re-
ports over the course of the study, but the proportion of entries
made within 2 hr of the preceding report did not differ across data
collection mode. Comparisons of the mood data across groups
suggested the ED group reported more within-subject variance in
negative affect, but internal consistency of the mood scales, cor-
relations between variables, mean levels of mood, and between-
subjects variance estimates did not differ across groups.
Finally, Green et al. (2006) conducted a third study in which 42
participants (members of 21 couples) reported on their moods,
relationship quality, and daily stressors once per day before bed-
time. Each participant recorded data for 1 week with a paper diary
and for 1 week with an ED. Both diaries asked participants to
indicate whether they were making their entry within 1 hr of
retiring for bed. Response to this item suggested that compliance
was not affected by data collection mode; this question was an-
swered affirmatively in 94% of the paper diary records and 92% of
the ED records. The EDs automatically recorded entry time.
Eighty-seven percent of the ED records were completed at times
consistent with expectable bedtimes (between 8 pm and 5 am).
This information was not available for paper diary entries. Anal-
yses of the recorded data again suggested broad equivalence of
substantive findings from the two modes of data collection.
Green et al. (2006) attributed these results to the care they took
in educating participants about the importance of compliance with
the diary design and in creating a collaborative relationship with
the respondents. The authors concluded that “compliance is much
more an issue of study design and participant motivation than it is
an issue of whether a diary is administered in paper-and-pencil
form or electronically” (p. 102).
Green et al.’s (2006) report clearly provides a counterweight to
the studies of Stone et al. (2002) and Broderick et al. (2003), but
it also has limitations. Unlike Stone et al.’s (2002) study, paper
diary entry times were not recorded objectively. Green et al.
(2006) were able to make plausible, but not definitive, inferences
about likely level of compliance with the paper diaries. For exam-
ple, although their Study 1 results suggested good correspondence
between the recorded diary times and the predetermined schedule
entered into the wristwatches, it is possible participants recorded
the prompt times when signaled but completed rating scales at a
later time (e.g., Broderick & Stone, 2006; Litt, Cooney, & Morse,
1998). It is not clear that differences in instructions or participant
motivation per se can explain their results, as these variables were
not explicitly manipulated or measured (Broderick & Stone, 2006).
Finally, the generalizability of their finding of broad equivalence
of the data across diary methods cannot be assumed; it might hold
for some research domains, populations, and diary designs, but not
for others (Takarangi, Garry, & Loftus, 2006; Tennen et al., 2006).
SPECIAL SECTION: INNOVATIVE ASSESSMENT METHODS
In sum, questions remain about the trustworthiness of paper
diary data and about the extent to which EDs produce incremental
benefits. At present, there seems to be consensus that (a) failure to
complete diary entries according to the diary protocol specified by
the assessor has the potential to seriously undermine the advan-
tages of diary methods; (b) compliance with paper diaries may
vary with factors such as the establishment of a working rapport
with the respondent, the demands of the diary protocol, and the
characteristics of the population; (c) the importance of verifiable
compliance will vary with the constructs under study and the goals
of the assessor; and (d) more systematic research is needed to
explore the scope and significance of paper diary noncompliance.
As Tennen et al. (2006) have pointed out, most assessors are not
especially interested in establishing the superiority of paper diaries
or EDs. Instead, they recognize the valuable capabilities of EDs
but grapple with the practical question of how to best allocate
limited assessment resources. Further research is needed to clarify
the conditions for which economical paper diaries are sufficient or
for which more expensive EDs are indispensable.
Intentional Bias: A Threat to All Diaries
Intentional bias can distort all forms of self-report. If a respon-
dent wants to mislead the assessor, there is nothing inherent in
either a paper diary or ED to prevent this. A study among Type I
diabetics by Mazze and colleagues (1984) found that the electronic
monitoring of glucose levels not only failed to correspond to paper
diary entries but also showed that the errors were systematic. Fully
two thirds of participants reported glucose values in their paper
diaries so as to obscure hyper- and hypoglycemia, leading to
biased clinical impressions of their glycemic control. In other
words, patients adjusted their glucose readings up or down, most
likely in an attempt to appear to have their glycemia under better
control than was actually the case. This is a sobering finding—
diary reporting bias can obscure a clear picture of clients’ true
This kind of intentional bias is much more likely to be of
concern in clinical assessment than it is in basic research. This is
because clients form and seek to manage relationships with their
therapists, because they are asked to report about painful or po-
tentially embarrassing topics, and because psychopathology may
directly affect judgment and diary compliance. Thus, clinicians
interested in using any type of diary to inform treatment must
encourage the client to provide veridical, “unflinching” diary en-
tries and explain to the client that assessment inferences and
treatment effectiveness may suffer if honest records are not kept.
In some clinical situations, the consequences of diary bias may
be substantial. For example, diary data are used to establish the
baseline severity of voiding to help decide whether a surgery is
needed in severe cases of urinary incontinence. If participants
overestimate their symptom severity (either through intentional
bias or inaccurate recall and backfilling), the potential exists for
diary noncompliance to lead to an inappropriate surgical interven-
tion (Hufford, in press). When the stakes are very high and the
potential for intentional bias exists, a more expensive but truly
objective assessment (e.g., inpatient observation or ambulatory
physiological monitoring equipment; see Haynes & Yoshioka,
2007) will be preferable to relying on self-report of any kind.
In this section, we take up a variety of practical questions related
to the use of EDs in clinical assessment, including selection and
configuration of diary technology, design of the diary protocol,
possible uses of ED data, and barriers to adoption in clinical
practice. Many of the considerations discussed below are relevant
to the use of paper diaries. However, to streamline discussion, we
presume in this section that the assessor has an interest in using the
Current Platforms for EDs
EDs can be implemented with numerous devices, and the pos-
sibilities will no doubt increase with the continued, rapid evolution
of mobile technologies and expanded demand for high-quality
diary data in healthcare. A relatively simple and often inexpensive
approach is to use an electronic device such as a pager or pro-
grammable wristwatch to signal respondents to complete paper
diaries (Broderick et al., 2003; Csikszentmihalyi & Larson, 1987;
Litt et al., 1998). As noted above, this approach represents an
improvement over a paper-only approach, though it can still result
in significant undetected noncompliance (cf. Broderick et al.,
2003; Green et al., 2006). Because these approaches rely on paper
diary forms, they are not well suited to flexible branching (use of
dynamic skip rules or tailored assessment modules triggered in real
time by provided responses). In principle, an assessor may design
a paper diary form with printed instructions guiding the respondent
to skip or complete unique parcels of items on the basis of
responses to key items, but this can be cumbersome. Assessment
branching is a task perfectly suited to computerized automation,
and it is commonly used with EDs based on computing platforms.
At present, personal digital assistants (PDAs) are the platform
most widely used to collect ED information. Barriers to entry for
PDA-based diaries can be relatively low. Database software that
allows a user to design diary-type forms may be purchased off the
shelf, and researchers have developed freeware programs that are
quite robust (links to several may be found in Conner, 2006).
Respondents can be prompted to complete these assessments by
using audible alarms set with standard calendar programs bundled
with most PDA operating systems. Most freeware programs incor-
porate routines for scheduling and delivering audible prompts.
Some allow respondents to make voluntary entries between
prompts. Most database programs allow time and date stamping of
the entered records. Thus, it is possible to implement a reasonably
sophisticated ED system inexpensively. In most clinical settings,
each client will require a tailored set of diary assessments and an
individualized prompt schedule. A low-cost, off-the-shelf setup
like this is sufficiently flexible for single-client adaptations. Avail-
able software packages vary in their capabilities, so different
assessment problems might require distinct solutions.
Programming enthusiasts might find numerous ways to elabo-
rate the basic off-the-shelf packages described above. The nonpro-
grammer may wish to hire out the work; indeed, provision of
custom-designed PDA systems is currently a growth industry (the
interested reader who performs an Internet search using the term
patient diary will locate numerous vendors). High-end, custom-
designed systems may incorporate any number of desirable fea-
tures, including sophisticated logical branching; complex, configu-
PIASECKI, HUFFORD, SOLHAN, AND TRULL
rable signaling scheme; livability features (e.g., the ability to
suspend prompting for a short time when responding would be
especially inconvenient); wireless communication with other elec-
tronic devices; specialized response formats (e.g., body maps to
indicate pain sites); objective tests (e.g., reaction time tasks); and
automated data transfer to a central server for compliance moni-
toring by the investigator.
Although custom-designed PDA-based diaries are powerful,
they may be too cost prohibitive or inflexible for many clinical
settings. Custom-designed patient diary solutions are chiefly used
in clinical trials of new pharmaceuticals and medical devices.
Bringing new drugs and devices to market requires meeting ex-
acting regulatory standards, and so the most sophisticated diaries
represent a considered investment in these contexts. Clinical trials
also tend to involve hundreds of participants, each of whom is
asked to record the same information in identical diaries. Thus,
such enterprises benefit from an inherent economy of scale per-
mitting allocation of significant resources to the coding, debug-
ging, and piloting of a single diary protocol. Commissioning
custom-developed diary solutions will typically make financial
sense only for large-scale, specialized clinical practices with a
compelling need to have all clients complete a common set of
assessments (e.g., a busy weight loss or eating disorders clinic
requiring all clients to monitor caloric intake).
Interactive Voice Response (IVR) systems are another com-
monly used technology for diary research. In IVR systems, respon-
dents interact with a data-collecting computer via telephone, usu-
ally by pressing touch-tone buttons. Implementing an IVR system
requires an investment in specialized telephony hardware and
software for programming and controlling interviews. The equip-
ment can be expensive but may compare favorably with PDA-
based assessment, especially when large numbers of participants
are completing diaries at the same time. This is because the
infrastructure is centralized and purchased once. Almost all clients
will have access to a telephone, so increasing the number of
respondents in the field does not increase the cost of assessment
until the volume of respondents outstrips the call capacity of the
IVR systems typically field incoming calls but can make out-
going calls as well. Outgoing calls may serve as prompts to record
diary entries. When outgoing calls serve as prompts and respon-
dents carry mobile phones, an IVR system can replicate many of
the best features of the most sophisticated PDA-based experience
sampling designs (Collins, Kashdan, & Gollnisch, 2003).
IVR has some advantages over PDA-based assessment. Because
of the ubiquity of telephones and the widespread use of IVR in
customer service, most respondents will need little training in the
use of an IVR interface. IVR users call in to a central computer,
allowing immediate screening of collected data by the assessor if
desired. If a respondent were to lose or break a PDA, data could be
lost or confidentiality might be compromised (if data were not
encrypted after entry). These concerns are not applicable in IVR
designs because all data are recorded in the central database, not
stored on respondents’ phones. Relative to IVR, PDAs have the
advantage of a more sophisticated interface. Thus, PDAs can use
a richer array of response formats (e.g., visual analog scales,
interactive picture maps, checklists). IVR systems, by contrast, are
generally limited to responses that can be mapped to telephone
keys. IVR systems can suffer from compliance problems, as re-
spondents may forget to make regular calls to passive, inbound-
only IVR systems (e.g., Shiffman, Dresler, et al., 2002). Nonethe-
less, ingenious assessments can be developed within these
constraints. For instance, objective cognitive assessments have
been administered successfully by IVR (e.g., Mundt, Ferber,
Rizzo, & Greist, 2001).
Diary data may be collected over the World Wide Web by
creating password-protected Web-based forms and instructing re-
spondents to log on at desired times to complete them. Web-based
forms have the advantages of centralized data collection, but
important disadvantages must be considered as well. The chief
problem with a Web-based diary is that it requires respondents to
have access to an Internet-connected computer to make a diary
entry. Some trials have suggested that it is difficult to get respon-
dents to log data routinely by Web because it is often inconvenient
to log on (Anhøj & Nielsen, 2004). Signaling is difficult to
integrate into Web-based designs, thus introducing sampling bias
predicated on when respondents choose to log on and make their
diary entry. The natural signaling mechanism—e-mail—will be
vulnerable to the same Web-access constraints as the Web diary
itself. Other signaling devices (e.g., wristwatches, pagers) might be
used, but their value is questionable if respondents do not have
ready access to a connected computer. These considerations sug-
gest that one cannot expect to gather many responses per day by
Web. To the extent this is true, Web-based diaries should be
limited to measuring processes that are expected to change rela-
tively slowly and for which short term (e.g., 24 hr) retrospection is
unlikely to yield significant bias.1A major potential limitation of
Web-based diaries is the possibility that third parties may intercept
the data. Secure encryption would be important for clinical appli-
cations for which confidentiality is paramount (Nosek, Banaji, &
The foregoing presupposes that self-reports are the main focus
of diary assessments. Mobile technologies are being developed and
tested for measuring a variety of specific behaviors in respondents’
natural environments. Examples include pill bottles that record the
times and dates of opening (Schmitz, Sayre, Stotts, Rothfleisch, &
Mooney, 2005), portable transducers for measuring smoking to-
pography (Hammond, Fong, Cummings, & Hyland, 2005), and
wearable devices that track blood alcohol concentration by mea-
suring excretion of ethanol through the skin (Swift, 2000). Small
digital recording devices have been used to sample acoustic data
from an individual’s environment. Judges can rate these recordings
to gauge a variety of variables such as mood, social interactions,
1Some of these limitations can be expected to lessen as mobile devices
like cellular phones become increasingly Web-enabled and as Web pages
are increasingly optimized for display on small, mobile screens. Ulti-
mately, convergence of technologies in handheld devices may minimize
distinctions between PDAs, IVR, and Web-based systems. That is, most
patients may eventually carry personally owned phones with PDA com-
puting power, excellent Web browsing, and persistent wireless Internet
connections. This will represent an embarrassment of riches for the diary
designer. Given the rapid pace of technological innovation, it is difficult to
project the form that sophisticated EDs will assume by the time they gain
widespread acceptance in clinical practice; one day soon the technologies
described in this article may appear as quaint as the mechanical counters
that were used for self-monitoring in the heyday of behavior therapy (e.g.,
Lindsley, 1968; Mahoney, 1974).
SPECIAL SECTION: INNOVATIVE ASSESSMENT METHODS
and time budgeting (e.g., Mehl, Pennebaker, Crow, Dabbs, &
Price, 2001). A thorough review of these technologies is beyond
the scope of this article, but their existence bears noting because
they may be combined with self-report diaries to create very
powerful assessments and may provide data that are especially
useful for helping patients change (e.g., Schmitz et al., 2005).
Considerations in Selecting and Configuring EDs
the goal is to capture ecologically valid, near-real-time samples of
respondents’ experience, then the diary must be mobile, swept
along, as it were, in the stream of daily experience. Current ED
platforms vary somewhat in their ease of integration into daily life.
Palmtop computers, pagers, programmable wristwatches, and sim-
ilar devices clearly meet this criterion, as they have been deliber-
ately designed to be convenient for use on the go. Telephone-based
IVR systems may be more or less integrated into the stream of
experience depending on the details of their implementation. When
cellular phones are used to connect with the IVR system (espe-
cially when the assessor provides the phone or the airtime), the
system is fully portable and capable of fairly dense experience
sampling. If a static line (e.g., the home phone) is the primary
avenue of communication, this limits the amount and timing of
respondents’ interactions with the diary. Web-based diaries are
subject to similar access constraints. Thus, the choice of diary
platform will, to some extent, be driven by the degree to which the
assessor wants to densely sample the behavior stream. Fleeting
processes with nonignorable intraday variation will require more
portable, convenient diary platforms.
Quality control automation.
sponsibilities that must be entrusted to respondents in paper diary
research can be automated in ED applications. As discussed ear-
lier, a chief advantage of EDs relative to paper diaries is that they
can automatically record the time and date of diary responses and
prevent entry of responses attempted outside a predetermined
window of compliance.
Carefully designed EDs are also programmed to prevent entry of
logically inconsistent or impossible responses (e.g., a respondent
would be prevented from indicating she or he is currently both
alone and with a friend in the same diary entry), to prevent missing
and out-of-range responses, and to automatically trigger any inter-
view skip rules in real time based on provided responses.
Signaling and behavior sampling.
important advantage of most EDs. Mobile ED platforms can au-
dibly prompt correctly timed responding. When diaries are to be
completed at fixed intervals, this spares the respondent the burden
of watching the clock. Signaling also permits the assessor to
sample experience according to more complex schemes (e.g.,
randomly sampled moments, random times within time strata) that
would be nearly impossible for respondents to execute without
preprogrammed prompts. Although there are many benefits asso-
ciated with signaling, it can also be intrusive. EDs often incorpo-
rate livability features that ease the burden and inconvenience of
signaled recording. For instance, some EDs (e.g., Shiffman, Paty,
Gnys, Kassel, & Hickcox, 1996) have been programmed with a
nap function that allows respondents to temporarily suspend sig-
naling at times when responding would be especially inconvenient,
dangerous (e.g., while driving), or disruptive (e.g., during a reli-
Portability is an important feature of any diary. If
Many important tasks and re-
Real-time signaling is an
gious observance). Although suspending prompts may result in
missed or delayed reports, these reports would likely be missed
anyway. Allowing respondents to suspend prompting raises thorny
issues in quantifying diary compliance, and frequent suspension of
prompts may threaten the ecological validity of the diary data
(Scollon et al., 2003; Tennen et al., 2006). On the other hand,
giving respondents the ability to suspend prompting at especially
inconvenient times probably renders more tolerable the minor
inconveniences respondents are likely to encounter at other sig-
Many diary protocols require respondents to initiate a recording
whenever an important event occurs. Often an assessor will be
interested in obtaining a detailed assessment of the states and
behaviors antecedent to the critical event. This is not burdensome
when the event is relatively rare. Some classes of events (e.g.,
smoking cigarettes, eating, habit disorders) may occur so fre-
quently that it is not possible to administer a detailed assessment
about each occurrence without usurping a large portion of the
respondent’s day. Mobile EDs can solve this problem by allowing
respondents to quickly log each instance of a frequent behavior
and by triggering a complete interview in response to only a
portion of those recordings (e.g., Shiffman, Paty, Gwaltney, &
Dang, 2004). Assuming a reasonable scheme is used to sample the
events (e.g., randomly determined percentage, first occurrence
within predefined time windows), this strategy can provide the
assessor with a complete record of the event of interest and
detailed information about representative exemplars. This kind of
data record would be achievable in principle by using paper diaries
with appropriate instruction and training, but the automation pos-
sible in EDs is well suited to this sort of complex sampling
Security and confidentiality.
programmed so that data are encrypted or otherwise protected on
entry. In IVR applications, data are transmitted to a central data-
base by phone, with no local data storage and thus no possibility
of a confidentiality breach. Care must be taken with Web-based
diaries to ensure secure data transmittal, but in principle these data
can be protected. It is important to note that the same features that
prevent prying eyes from accessing the data also prevent the
respondent from reviewing or editing prior entries. This likely
diminishes carryover bias between adjacent reports and ensures the
data captured in real time are not subject to respondents’ later
reappraisal of their experiences.
EDs running on PDAs may be
Considerations in Designing a Diary Protocol
Time and event sampling.
quency and timing of diary records must be carefully thought
through by assessors and should be informed by the goals of the
assessment (Bolger et al., 2003; Delespaul, 1992; Stone & Shiff-
man, 2002). These decisions will also have consequences for the
types of ED platform considered suitable and for the total assess-
ment burden placed on respondents. In addition to practical con-
siderations, these decisions should be guided by consideration of
the clinical construct of interest. What is it that the clinician is
attempting to measure, does the nature of the construct have
implications for the frequency of assessment or the sampling
strategy, and what is the best way to measure the construct in the
Decisions about the desired fre-
PIASECKI, HUFFORD, SOLHAN, AND TRULL
The simplest diary protocols require respondents to make only
one entry per day, often in the evening so that the entire day can
be reflected on. These end-of-day designs impose relatively little
burden on respondents (Stone & Shiffman, 2002). They can be
supported with a variety of platforms because they do not neces-
sarily require portability or audible signaling. Once-daily assess-
ment may be useful for measuring processes or behaviors that are
discrete, are highly salient, occur relatively infrequently, and prob-
ably can be reported accurately in 1-day increments.
If the events of interest occur rarely, increasing the frequency of
reports beyond once per day often represents a poor cost–benefit
trade off; multiple intraday assessments increase participant bur-
den but may not capture more instances of the event. Subjective
states expected to change very slowly over time (e.g., mood in
bereavement) might also be reasonably assessed with once-daily
measures. In these cases, diary questions can ask about the respon-
dents’ immediate state (e.g., “How do you feel right now?”). For
slow moving processes, immediate samples of daily experience
can be taken infrequently and still track the process accurately.
Assessors interested in response domains that are more mun-
dane, frequent, fleeting, or variable will often require more than
one assessment per day to obtain the benefits of diary assessment.
As the frequency of assessments increases, portability of the ED
becomes correspondingly more important. Burden also increases,
perhaps decreasing the maximum interval over which the assess-
ment protocol can be maintained.
Depending on the nature of the construct of interest, assessors
may choose a time-based sampling strategy, an event-based strat-
egy, or a mix of the two approaches. Time-based schemes are
generally appropriate for tracking states that are naturally in flux
and that can be understood without reference to critical, discrete
setting events. Time-based schemes may also be used as comple-
ments to global retrospective self-reports; the assessor may not be
interested in examining intraday change processes but may want to
aggregate across multiple “dipstick” measures taken throughout
the day to construct an unbiased estimate of mean symptom levels.
Time-based schemes can use fixed schedules (always requiring
diary entries at the same times each day) or a random sampling
schedule. Random scheduling increases the representativeness of
the collected data and will generally require a mobile platform
with audible signaling capability. In general, random time sam-
pling may be more burdensome to the respondent because she or
he never knows when the next assessment will occur and cannot
plan other daily tasks around assessments (of course, planning
around assessments probably introduces bias into the record, so
this may be a case in which increased burden is offset by increased
data quality). Because it is possible for truly random samples to
occur near to one another in time, random time sampling has the
potential to frustrate a respondent and to hamper systematic ex-
amination of intraday change processes. For this reason, assessors
sometimes find it useful to stratify time sampling, breaking the day
into several discrete windows and randomizing the timing of diary
prompts within each window.
Assessors may want to use an event-sampling scheme when the
primary interest concerns critical, discrete events that would often
be missed by time sampling alone. For example, EDs might be
used in the assessment of panic attacks. For many, panic attacks
seemingly come “out of the blue” and are unpredictable. There-
fore, EDs that can be accessed 24 hr per day may be particularly
suited to the clinical assessment of these attacks. Not only can the
frequency of such attacks be recorded, but it is also possible to
assess additional information such as the severity of attacks, the
timing of attacks (onset, duration), the antecedent circumstances,
the exact symptom profiles of attacks, the coping strategies used
and the outcome, and the aftermath of attacks. Obviously, this type
of assessment would be quite rich and quite useful for the clinician.
The value of EDs in this case is that the assessment can occur in
close temporal proximity to the panic attack, providing the clini-
cian with an electronic bird’s-eye view of the client’s experience.
Other examples of event sampling are life events or experiences
that might trigger or exacerbate negative affective states. For
instance, it might be of clinical importance to assess a depressed
client’s response to and interpretation of particular stressful life
events. Perhaps the client has high levels of dependency, making
him or her more vulnerable to perceived rejection or interpersonal
conflict. These experiences are especially salient to depressed
individuals with strong dependency needs. A real-time assessment
using an ED would allow for an evaluation of the impact of these
events, the client’s thoughts and cognitions that followed the
events, and the effect of these events on subsequent mood, for
example. In this scenario, the event that would prompt the client to
initiate an ED session might be an argument with a spouse or
partner, perceptions of being left out or rejected, or parent–child
Event sampling requires self-monitoring and cannot be aided by
signaling; respondents must identify the occurrence of the event
and initiate recording themselves. Notably, this leaves room for
undetected noncompliance or respondent error (Litt et al., 1998).
Respondent-initiated recording may be especially difficult when
the events of interest are parametric changes in continuously
fluctuating experiences (e.g., strong negative mood, urge or temp-
tation to smoke after quitting). To the extent such state changes
occur insidiously rather than abruptly, the respondent may have
difficulty determining when a threshold value, worthy of record-
ing, has been attained.
Event sampling, used alone, has limited applications, such as
monitoring rates of a symptom or behavior. This may be very
useful clinically as a gauge of compliance with prescribed behav-
iors (e.g., medication use, homework) or as a measure of response
to treatment (e.g., “Are binges becoming less frequent across a
course of therapy?”). The validity of clinical inferences made from
an event-sampling design depends on how carefully the respondent
logs requested events. Self-monitoring of volitional events may
result in measurement reactivity—paying attention to one’s behav-
ior may increase the frequency of desired acts and decrease unde-
sirable acts (Hufford, Shields, Shiffman, Paty, & Balabanis, 2002).
This is a considerable problem in basic research, but it tends to
work to the clinician’s advantage (Shiffman, 1988).
Event sampling is often used in conjunction with a concurrent
time-sampling protocol so that user-initiated recordings of espe-
cially important moments can be integrated with background in-
formation about surrounding experiences. An assessor can easily
develop erroneous hypotheses about the factors driving monitored
events by examining only event-based reports (Sutton, 1993).
Time sampling provides an estimate of the base rates of explana-
tory states and behaviors in the respondent’s experience. A clini-
cally significant setting event or outcome should not only co-occur
with the event but also should be more frequent in event records
SPECIAL SECTION: INNOVATIVE ASSESSMENT METHODS
than would otherwise be projected from the base rate alone (Paty,
Kassel, & Shiffman, 1992).
A complete consideration of the most appropriate application of
time- and event-based sampling is beyond the scope of this review.
Interested readers are referred to the relevant literatures on the
experience sampling method (ESM; Reis & Gable, 2000) and
ecological momentary assessment (EMA; Stone & Shiffman,
1994; Hufford, in press).
ED recording can be taxing for the re-
spondent. Respondent burden might take on unique dimensions in
the context of clinical assessment. On the one hand, the clinical
assessor may have an advantage over the basic researcher; if
patients are convinced that the recording process will aid their
recovery, they may be especially tolerant of recording burden. On
the other hand, the clinician needs to proceed cautiously because
inadequately justified respondent burden (or exhortations to im-
prove diary compliance) could distract from the course of treat-
ment or otherwise sour the therapeutic relationship. Such issues
have not been systematically studied, and so it is unclear whether
either of these effects is large or under what circumstances either
might occur. Nonetheless, the possibilities are worth noting; pru-
dence may be called for when introducing ED methods in the
Several aspects of the ED protocol will affect the total burden on
respondents. As mentioned above, the sampling scheme and rate
will have important effects on total burden. Requiring more as-
sessments per day, randomizing assessments (thereby making
them unpredictable), and requiring respondents to initiate record-
ings each increase the total effort required. The complexity of
assessments also affects burden. Assessments requiring complex
cognitive tasks (e.g., averaging some experience over time, com-
pleting timed math tasks to measure cognitive function) increase
the burden (Hufford & Shiffman, 2003).
The duration of monitoring also affects total burden. Over the
first several days of monitoring, practice effects may speed record-
ing and temporarily ease respondent burden. As monitoring con-
tinues, however, the burden mounts. Long-term monitoring is
certainly possible. Intensive daily monitoring protocols in treat-
ment seekers have been maintained for as long as 4–8 weeks
(Collins et al., 1998; Shiffman et al., 1996). Non-treatment seekers
have been monitored with once-daily IVR assessments for as long
as 2 years (Helzer, Badger, Rose, Mongeon, & Searles, 2002). All
else being equal, burden is bad for patient motivation and data
quality. It is conceivable that patients would consent to ongoing,
open-ended recording when the diary information tracks treatment
progress and is integrated into the therapeutic process. As a general
rule of thumb, however, one should have a clear rationale to justify
extended monitoring. Long-duration recording may be most useful
when the patient is undergoing a distinct change process (e.g.,
quitting drinking, starting a new medication) during which a lot of
“action” is expected. In many cases, short bouts of ED monitoring,
separated by longer assessment-free intervals, will balance respon-
dent burden and clinical information gathering. In clinical settings
in which time and resources for data analysis are limited, long-
term monitoring might be avoided simply because it will not be
possible to examine all of the collected information.
The length of the assessment battery comprising each diary
report is another component of total burden. In general, there is a
trade off between the breadth of the individual diary record and the
frequency with which that assessment can be delivered. As the
length of the diary assessment increases, recording frequency must
be reduced to maintain a constant degree of burden. When intraday
sampling is dense, brevity is at a premium. Obviously, there is also
a brevity–information trade off. More than in other clinical assess-
ment enterprises, ED assessments require that assessors sacrifice
many nice-to-have items from their assessments, leaving a distil-
late of need-to-have content. Often this need for brevity will work
at cross-purposes with conventional psychometric goals. For in-
stance, the assessor may need to create ultra-short forms of estab-
lished scales by selecting subsets of items that cover the domain
adequately. Single item assessments may have to suffice for tap-
ping some constructs. These decisions may negatively affect the
reliability or validity of the assessment, so care must be exercised
when very brief ED assessments are desired. Pilot work using
longer ED assessments with concurrent measurement of key cri-
terion variables may be useful for empirically identifying brief
item parcels optimized for reliability and predictive validity. At-
tention to face validity and to carefully training respondents about
the intended meaning of items may also mitigate psychometric
problems associated with very brief assessments.
Stone, Broderick, et al. (2003) systematically varied burden by
assigning chronic pain patients to make pain ratings in an ED
either 3, 6, or 12 times per day. In all groups, the ED assessment
consisted of 19 questions that required approximately 3 min to
complete. Patients asked to make 12 responses per day reported the
highest degree of burden, but the absolute level of their ratings was
low (i.e., a rating of slightly burdened). Groups did not differ in
their compliance with the diary, and there was no evidence of
reactivity in pain ratings associated with sampling density. Burden
is an important consideration and needless burden should always
be avoided. Nonetheless, these data suggest it is possible to
achieve high rates of compliance with dense random sampling
protocols when carefully designed and implemented ED protocols
Training and run-in.
Respondents will generally require at
least 30–60 min of training in the use of an ED. Details of the
training will vary depending on the constructs assessed, diary
platform, sampling scheme, and so on. In general, respondents
should be guided through an entire diary entry, item by item.
Training should focus on technical details of the user interface,
such as how to initiate recording (whether signaled or user initi-
ated), proper use of the response scales, how to advance from one
item to the next, how to review entries (if applicable), and how to
terminate an entry. A guided item-by-item review provides an
occasion to discuss the intended meanings of any potentially
ambiguous items; this may be especially important when the ED
platform requires using relatively telegraphic item wording to
conserve screen space (PDAs) or air time (IVR).
Training should also focus on important protocol topics such as
where and when the diary should be completed, what circum-
stances should trigger respondent-initiated recordings (if applica-
ble), what should be done if an entry is missed, under what
circumstances it may be permissible to suspend prompting or forgo
making an entry (such as while driving), and what is the lowest
level of diary compliance that threatens the validity of the enter-
prise. Applicable technical issues should also be considered such
as what to do (or whom to contact) in the event of a device failure,
PIASECKI, HUFFORD, SOLHAN, AND TRULL
how to identify battery levels and recharge the ED as needed, and
how to adjust features like screen brightness and prompt volume.
The existing research literature has suggested that ED methods
can be learned fairly easily by a wide variety of patient popula-
tions. Indeed, respondents have successfully interacted with hand-
held EDs while intoxicated (Collins et al., 1998; Hufford et al.,
2002), suggesting that respondents need not possess extraordinary
cognitive powers for the method to be successful. Nonetheless,
there may be patient populations for whom computing technology
is unfamiliar (e.g., the elderly, low-SES persons on the wrong side
of the digital divide) or who have physical limitations (arthritis,
visual impairments) that prevent use of small devices such as
PDAs. Use of EDs with these populations may be contraindicated
or may require extra training or care in the design of user interfaces
(Palmblad & Tiplady, 2004).
A run-in or trial period of several days is often useful before the
diary data are seriously scrutinized. Allowing respondents to in-
teract with the ED for several days gives them practice and allows
them to identify and to ask any questions about the protocol that
would not have been obvious without hands-on experience. The
assessor may wish to monitor compliance and deliver corrective
feedback during the run-in period. Research has documented a
repeated measures effect, a general tendency for ratings of sub-
jective states to change systematically over a period of several
days before stabilizing. This appears to be directly attributable to
the reactive effects of self-monitoring and experience with the
rating task rather than to the change in the underlying construct per
se (Gilbert et al., 1998; Sharpe & Gilbert, 1998). Thus, discarding
(or at least cautiously eyeing) data from the first several days of
recording may prevent the assessor from drawing specious infer-
ences about apparent patient change.
Using the Data
In this section, we present examples of the types of analyses we
expect to be most commonly of use in clinical hypothesis testing
and to be best at fostering productive discussion between client
and clinician. Although only a few examples are presented here,
we note that potential uses of the rich data obtained from an ED
protocol are very numerous and will depend on factors such as the
goals of assessment, the diary design, and the constructs assessed.
Readers interested in the statistical analysis of ED data are referred
to existing reviews of the topic (e.g., Affleck, Zautra, Tennen, &
Armelli, 1999; Bolger et al., 2003; Schwartz & Stone, 1998).
In our examples, data are plotted because we assume graphical
summaries of the data will be especially useful to both the clinician
and the patient. We focus on interpreting patterns visually because,
owing to the realities of clinical practice (time demands, client
variation in receptivity to feedback couched in terms of statistical
parameters), we expect clinicians would typically use diary data in
this way. We hasten to add that diary data may be analyzed with
greater statistical rigor, and we do not intend to discourage more
precise analyses. For example, a recent case study of the causes of
the urge to exercise in an anorexia nervosa patient by Vansteelandt
et al. (2004) has provided a detailed example of a more rigorous
An interesting and important issue is whether ED data provide
unique information about symptoms, experiences, or events that
could not be obtained by traditional clinical methods like a clinical
interview or a questionnaire. Clearly, if the same information can
be obtained through a traditional clinical assessment method, there
is no reason to rely on a time-intensive method of data collection
like an ED. Here we provide an example in which ED data do
appear to provide unique information. Figures 1 and 2 present
frequency data on the number of mood shifts, both negative and
positive, that were endorsed by a male patient with borderline
personality disorder during 4 weeks of monitoring. In both figures,
bars represent the number of mood shifts endorsed for that time
period, assessed with an ED and with the patient’s self-report on a
retrospective calendar-based interview at the end of the 4-week
recording period. In each figure, separate plots are given for shifts
in negative and positive mood. By using ED data, we operation-
alized a mood shift (affective instability) as a ? 1.5 standard
deviation change from the patient’s immediately preceding report
of negative (or positive) affect. The top panel of Figure 1 shows
that ED data indicated eight negative mood shifts over the 1st
week whereas only one negative mood shift was reported on the
calendar measure. For all weeks but Week 4 (the week immedi-
ately preceding administration of the recall measure), there are
significant discrepancies as to the number of both negative and
positive mood shifts on the basis of the ED measure versus the
retrospective measure. As Figure 2 shows, however, even for
Week 4 (which actually covered 8 days) there are discrepancies in
the report of the dates of the shifts. For example, not only did the
number of negative mood shifts derived from the two reporting
methods not agree, but there was no day in Week 4 in which there
was at least a single report of a negative mood shift for both
Clearly, we cannot generalize from these single-subject data to
all clinical situations and phenomena. However, what is most
intriguing about the Week 4 data is that the data come closest to
mimicking the typical clinical assessment situation. Clinicians
often ask their clients what has transpired over the last week,
including predominant moods, symptoms, or events. Data from
Figure 2 suggest that at least retrospective reports on mood shifts
over the last week do not correspond well to real-time data pro-
vided by this patient while in his natural environment. It may be
quite important to know that a significant negative mood shift
occurred on Days 2, 3, 4, and 7 (the day before the calendar
measure was completed!) according to reports on the ED. Of
course, ultimately, these ED reports must be shown to be associ-
ated with important outcomes or with treatment response. How-
ever, these data do show that clinicians at the very least should be
cautious in assuming that retrospective, review-of-the-week re-
ports are precise and match what might be found by using diary
If discrepancies of this sort exist, the ED data may be useful for
reconstructing and examining overlooked symptomatic moments
for discussion during the therapy hour. Figure 3 displays negative
affect records from 1 day (using random prompts) provided by the
same borderline personality disorder patient whose data were
summarized in Figures 1 and 2. Individual ratings are plotted as
filled circles; the overall mean for the day is plotted as the dashed
horizontal line. Relative to the earlier data points and the overall
daily mean, it is clear that the rating made at 9:57 pm is signifi-
cantly elevated and may mark an event worthy of clinical discus-
sion. The example plot is annotated with other events that were
reported at the same recording as the elevated negative affect
SPECIAL SECTION: INNOVATIVE ASSESSMENT METHODS
score. A plot such as this (or simple scrutiny of the diary record)
can be used to jog a patient’s memory for the circumstances
surrounding a highly symptomatic moment and thereby can foster
productive discussion. The data may also help the patient gain
perspective on his or her experiences. For instance, if this patient
were to declare that he was “miserable all day Sunday,” the data
could be marshaled to show that, in fact, he was essentially
asymptomatic most of the day and is engaging in selective abstrac-
An interesting use of ED data, made possible when event-related
recordings and random moments are combined, is to study the
unique correlates of an important event (see Paty et al., 1992, for
Week 1Week 2 Week 3Week 4
Number of Mood Shifts
Week 1 Week 2Week 3Week 4
Number of Mood Shifts
electronic diary records collected over 4 weeks and by retrospective recall of mood shifts during the same period
as assessed with a calendar-based self-report instrument at the end of Week 4. The top panel displays data on
shifts in negative mood, and the bottom panel shows shifts in positive mood.
Comparison of mood shifts from a patient with borderline personality disorder, as identified from
PIASECKI, HUFFORD, SOLHAN, AND TRULL
a fine treatment of this issue). We illustrate this idea by using diary
data from a college student (not in treatment or treatment seeking)
who carried an ED as part of a study of smoking behaviors. The
student was instructed to complete a user-initiated recording each
time a cigarette was smoked during a 2-week monitoring period.
He also responded to four random prompts spaced throughout the
day over the same 2-week period. This permits analyses following
the logic of a case-control design: Situational factors from the
smoking-related records can be compared with the base rates of
factors in the random prompts to identify high-risk situations.
Day 1Day 2 Day 3 Day 4Day 5Day 6 Day 7 Day 8
Number of Mood Shifts
Day 1 Day 2 Day 3Day 4Day 5 Day 6 Day 7 Day 8
Number of Mood Shifts
during Week 4. Data were collected from the patient shown in Figure 1. The top panel displays data on shifts
in negative mood, and the bottom panel shows shifts in positive mood. Note that Week 4 actually covered 8 days.
Comparison of mood shifts across assessment modes (electronic diary vs. retrospective calendar)
SPECIAL SECTION: INNOVATIVE ASSESSMENT METHODS
Figure 4 depicts such comparisons for two classes of antecedent
circumstances, ongoing stressors and current social contacts,
which were reported at each ED recording. The base rate infor-
mation is very useful for divining the important antecedents of
smoking. If only the smoking records were considered, for in-
stance, one might dismiss concerns about receiving a low grade as
an instigator of smoking because it was only reported in about 25%
of smoking records. Compared with the base rate of concern over
a low grade (about 10%) however, it seems that this particular
stressor may confer some unique risk. A similar point might be
made with respect to school workload stressors. Considering only
the smoking records, one might conclude that moments spent in
the company of friends are an especially powerful stimulus to
smoking—more than 75% of smoking was reported under these
conditions. However, the base rate of spending time with friends
was also very high (69%). This tempers conclusions about the size
of the effect, but the discrepancy in rates suggests the effect may
Other important influences on smoking might be overlooked
completely without the benefit of random prompt information. For
instance, zero smoking was reported in the presence of family or
the romantic partner. Although these moments occurred rarely,
they appeared to have a complete inhibitory effect on smoking—
and may suggest intervention strategies. This effect would not be
noticed in the smoking records alone because instances of these
moments do not appear in those records. Looking at the random
prompt data alone might not suggest family or partner contacts as
being especially important because they have low base rates in this
patient’s daily experience. Only combining the two data sources
reveals a potentially powerful social influence effect.
Barriers to Adoption of EDs in Clinical Assessment
Significant economic and scientific barriers stand in the way of
widespread adoption of EDs in clinical practice. Historically,
clinical psychologists carved out a professional niche in the mental
health care industry by specializing in assessment (Benjamin,
2005). Over the past half century, psychologists have diversified
their professional activities and, as a consequence, have allocated
less time to assessment activities (Camara, Nathan, & Puente,
2000). The ascent of managed care has put more pressure on
psychological assessment. Increasingly, clinicians complain that
managed care has negatively affected their practices (Phelps, Eis-
man, & Kohut, 1998). Clinicians have difficulty obtaining reim-
bursement for services, spend increased time on paperwork, find
themselves having fewer approved sessions in which to effect
change with their clients, and spend less time in the direct provi-
sion of services. Managed care entities are reluctant to reimburse
clinicians for assessments that have not been empirically demon-
strated to aid treatment or reduce secondary health care costs
(Groth-Marnat, 1999). These pressures have driven psychologists
to administer less frequent or less comprehensive assessments
In their current stage of development, EDs may be difficult to
integrate with these realities of clinical practice. A recent
survey of clinical psychologists found that 81% spent fewer
09:24 AM 12:55 PM02:38 PM 06:13 PM08:20 PM 09:57 PM
Disagreement with boss and coworker
Rejection by boss and coworker
Felt let down
Experienced a loss
to stimulate discussion during a treatment session. The figure presents negative affect ratings from random
prompts throughout the course of 1 day for the borderline personality disorder patient whose mood shift data
were depicted in Figures 1 and 2. The dashed line indicates the day’s negative affect mean. The annotation
summarizes potentially important triggers of the elevated negative mood that were reported concurrently.
Illustration of using electronic diary data to identify significant moments and place them in context
PIASECKI, HUFFORD, SOLHAN, AND TRULL
than 5 hr per week on assessment-related activities, including
administering, scoring, and interpreting test results (Camara et
al, 2000). The most commonly used tests are standard measures
of cognitive abilities and personality. These tests are unlikely to
be supplanted by ED methods, suggesting that clinicians would
have to make room in their schedules to integrate diary moni-
Clinicians who already routinely use paper diary assessments
might expect some time savings from shifting to the use of EDs
because electronic data capture eliminates the need for data entry.
In one recent study comparing 8-day paper diaries and EDs in a
sample of Parkinson’s disease patients, the authors reported that
data entry and quality control for the paper diaries required 96
person-hours, whereas managing the electronic data required less
than 4 person-hours (Nyholm et al., 2004).
For clinicians integrating diary methods into their practices for
the first time, ED assessment could easily consume more time than
they want to spend. Although much of the data collection in a diary
assessment protocol happens outside the office on the client’s time,
use of EDs is still fairly time intensive for the clinician. The ED
protocol and assessment contents will typically need to be tailored
to the presenting complaints of the client. Some portion of therapy
sessions will need to be devoted to discussing the rationale for
diary monitoring, enlisting the client’s collaboration, training the
client in the use of the diary and the rationale of the diary protocol,
and sharing the results of the collected data with the client. The
assessor must conceive of the relevant data manipulations, manage
the data, conduct the analyses, and perhaps summarize the data in
a graphical or tabular form suitable for sharing with the client. To
the extent that ED assessments are tailored to the unique circum-
stances of individual clients, there may be no way to significantly
automate and speed these tasks. Time demands are likely to be
especially steep when an ED program is first adopted because the
clinician must become familiar with the supporting technologies
and work through the difficulties of integrating the methods into a
busy practice. Clinicians already squeezed for time may be reluc-
tant to incur these start-up costs in the absence of evidence that
EDs carry benefits outweighing them.
Making matters worse, there is little prospect that clinicians
would be reimbursed for ED assessments from managed care
entities in the near future. Although published investigations using
ED methods have often used clinical populations, studies typically
have been designed to evaluate the advantages of EDs or to probe
substantive theories. To date, EDs have not been extensively
integrated into empirically tested treatment protocols, and so it is
unclear whether any improvements in patient outcome or cost
efficiency would be obtained through their use. Managed care
organizations are unlikely to reimburse clinicians for time spent
conducting ED assessments until compelling health economic ben-
efits are demonstrated.
PERCENTAGE OF OCCASIONS
ONGOING STRESSORSSOCIAL CONTACTS
with the event. A college student initiated a record of each time a cigarette was smoked and responded to random
prompts four times per day over a 2-week period. The plot shows the percentage of each type of assessment in
which a particular ongoing stressor (left half of the figure) or current social contact (right half) was endorsed.
Comparing the base rates (random prompts) with rates in the smoking records allows the clinician to gauge effect
sizes of associations between smoking and particular antecedent events.
Example of using random- and event-based records to identify circumstances uniquely associated
SPECIAL SECTION: INNOVATIVE ASSESSMENT METHODS
Can real clinical benefits be obtained from the use of ED
methods? We believe that this is fundamentally an empirical
question. Like any other instrument, EDs have the potential to be
used wisely or wastefully. The value of ED assessments might be
expected to vary depending on factors such as diary design, use,
patient population, and integration with the treatment protocol.
Groth-Marnat (1999) has recently noted the need for demonstrat-
ing the financial efficacy of clinical assessments and has described
several guidelines for financially defensible assessments. Among
these, several are natural fits with ED-based methods: (a) a focus
on domains relevant for treatment planning and outcomes, includ-
ing tailoring contents to the individual client; (b) the use of
computer-assisted assessment; (c) the formation of a closer link
between assessment, feedback, and intervention; and (d) integra-
tion of treatment planning, the monitoring of progress, and the
evaluation of outcomes.
Empirical evaluations of clinically relevant ED methods in
clinical settings are essential prerequisites for reassuring managed
care providers, piquing clinicians’ curiosity in the methods, and
stimulating entrepreneurial interest in developing and marketing
flexible diary products that are user friendly for both clinicians and
clients. Psychotherapy outcome research is perhaps the most likely
route through which ED methods will make inroads into current
practice. If academic clinical researchers integrate ED methods
into novel, manualized treatments, the ED protocols ultimately
might be disseminated alongside treatment manuals, client work-
books, and allied materials. Under this model of dissemination, the
EDs would probably be fairly standardized, minimally tailored to
the individual client. However, this standardization would lower
the barrier to clinical adoption, as would any demonstration that
the ED is part of an empirically supported treatment for a partic-
ular patient population. Although being encompassed by an em-
pirically supported treatment would likely speed adoption of ED
methods in routine practice, the strongest evidence for the use of
EDs would be derived from dismantling or manipulating assess-
ment designs (Hayes, Nelson, & Jarrett, 1987) by explicitly testing
whether a constant form of treatment is significantly improved
when an ED component is added.
In a recent review of trends in clinical assessment, Wood, Garb,
Lillienfeld, and Nezworski (2002) concluded that developments in
basic research “are likely to introduce radically new and unex-
pected developments into the assessment process” (p. 535). They
imagined the replacement of inkblots by DNA kits and pocket
neuroimagers, innovations considerably more radical than EDs.
We agree with the general precept that assessment practice will
ultimately have to follow significant developments in basic re-
search. Until relevant clinical studies are conducted, the jury is still
out on whether the benefits of EDs are compelling enough to merit
widespread use in clinical practice. However, broad trends make
clinical use of ED methods seem likely. First, the supporting
technologies are becoming ubiquitous. Hardware and software will
become more affordable, and clients will be increasingly comfort-
able carrying and interacting with mobile devices. Second, the
increasing familiarity with EDs and their unique benefits in aca-
demic circles has already led to their inclusion in clinical research
(e.g., Collins et al., 1998; Turner, Mancl, & Aaron, 2005). We
expect it is only a matter of time before EDs take center stage in
some clinical investigations, making the shift from being viewed
as a tool to evaluate treatments toward becoming an integral,
collaborative part of the treatment enterprise under study. Indeed,
mobile devices ultimately might be used to extend therapy delivery
into patients’ daily lives. That is, EDs may deliver tailored thera-
peutic messages, not just record patient’s experiences (e.g., Bran-
don, Copeland, & Saper, 1995; Burnett, Taylor, & Agras, 1985).
Like managed care providers, clinical researchers are concerned
with cost effectiveness, so exploration of the clinical benefits of
ED methods may be most likely to occur in studies of patient
populations associated with especially high costs for health care
systems (e.g., somatization, panic disorder, personality disorders).
Because great cost savings can be obtained by finding effective
ways to intervene with such conditions, they may be especially
likely to bear the cost of expensive diagnostic and assessment
practices. As the cost of ED monitoring drops over time, and
assuming tangible benefits of ED assessments can be demonstrated
in some domains of psychopathology and behavioral medicine,
EDs could be recruited for use with patients with a wider array of
Although the barriers to adoption are substantial, they must be
weighed against the benefits of using EDs in the clinical setting.
Collecting experience samples may introduce new and positive
dynamics to the therapeutic process. Donner (1992) noted the
The therapist who comments on a patient’s self-reports of experience
is not perceived as a magician who offers mysterious incantations
derived from inkblots or pictures. Rather, the therapist . . . works as a
consultant who helps refine patients’ self-understanding by reflecting,
clarifying, and exploring the patients’ self-reported experiences. (p.
Donner suggested that (a) the “scientific mystique” of EDs can
increase patients’ respect for the clinician and the therapeutic
process, (b) diary data can make therapists’ interpretations more
credible to the patient, and (c) the patient is likely to regard diary
data as more relevant to their personal experiences because they
are directly based on those experiences. Practitioners ultimately
may come to prize these kinds of intangible benefits as highly as
the technical features of ED methods.
In many health-related research domains, EDs are rapidly be-
coming the de facto standard because of their increasing afford-
ability and the conceptual and practical advantages they afford
over traditional diary methods. A similar evolution of the ED—
from clever curiosity to common tool of the trade—is very likely
to occur in routine clinical assessment. At present, clinicians who
use these techniques will probably be the most venturesome,
curious, and computer savvy. The continued proliferation of pow-
erful mobile electronic devices will tend to promote wider use of
EDs in clinical settings. Clinicians will become increasingly com-
fortable with the technologies, and patients will become increas-
ingly accustomed to carrying mobile devices and thus more open
to the idea of an ED. Convergence of devices (e.g., smart phones
with wireless Web access and PDA functionality) and increasingly
powerful device features (e.g., wireless file sharing between de-
vices) will increase the convenience of EDs and their functionality.
For instance, it is not difficult to imagine, on the basis of current
technologies, a time in the near future when a clinician could
PIASECKI, HUFFORD, SOLHAN, AND TRULL
wirelessly transfer a diary program to a patient’s own smart phone
or PDA and wirelessly transfer the data back at the next office visit
or stream the data back in real time. It may become possible to
arrange two or more diaries to communicate with one another
wirelessly so that a significant event on one device elicits re-
sponses from multiple members of the patient system as in family
or couples treatment. Though technological improvements will
drive adoption of EDs in clinical practice, widespread adoption
will require entrepreneurs to develop simple turnkey software
packages that are optimized for clinical practice, speeding diary
construction, implementation, and data analysis. Ultimately, a key
ingredient will be whether clinicians develop and maintain a
commitment to ongoing, high-quality assessment of clinical prob-
lems. If those interests are fostered, clients will be well served, and
the market will drive development of increasingly accessible ED
Affleck, G., Zautra, A., Tennen, H., & Armelli, S. (1999). Multilevel daily
process designs for clinical and consulting psychology: A preface for the
perplexed. Journal of Consulting and Clinical Psychology, 67, 746–754.
Anhøj, J., & Nielsen, L. (2004). Quantitative and qualitative usage data of
an Internet-based asthma monitoring tool. Journal of Medical Internet
Research, 6, e23.
Beck, A. T. (2005). The current state of cognitive therapy: A 40-year
retrospective. Archives of General Psychiatry, 62, 953–959.
Benjamin, L. T., Jr. (2005). A history of clinical psychology as a profession
in America (and a glimpse at its future). Annual Review of Clinical
Psychology, 1, 1–30.
Bolger, N., Davis, A., & Rafaeli, E. (2003). Diary methods: Capturing life
as it is lived. Annual Review of Psychology, 54, 579–616.
Bradburn, N. M. (2000). Temporal representation and event dating. In
A. A. Stone, J. S. Turkkan, C. A. Bachrach, J. B. Jobe, H. S. Kurtzman,
& V. S. Cain (Eds.), The science of self-report: Implications for research
and practice (pp. 49–61). Mahwah, NJ: Erlbaum.
Brandon, T. H., Copeland, A. L., & Saper, Z. L. (1995). Programmed
therapeutic messages as a smoking treatment adjunct: Reducing the
impact of negative affect. Health Psychology, 14, 41–47.
Broderick, J. E., Schwartz, J. E., Shiffman, S., Hufford, M. R., & Stone,
A. A. (2003). Signaling does not adequately improve diary compliance.
Annals of Behavioral Medicine, 26, 139–148.
Broderick, J. E., & Stone, A. A. (2006). Paper and electronic diaries: Too
early for conclusions on compliance rates and their effects—Comment
on Green, Rafaeli, Bolger, Shrout, and Reis (2006). Psychological
Methods, 11, 106–111.
Broderick, J. E., Stone, A. A., Calvanese, P., Schwartz, J. E., & Turk, D. C.
(2006). Recalled pain ratings: A complex and poorly defined task. The
Journal of Pain, 7, 142–149.
Burnett, K. F., Taylor, C. B., & Agras, W. S. (1985). Ambulatory
computer-assisted therapy for obesity: A new frontier for behavior
therapy. Journal of Consulting and Clinical Psychology, 53, 689–703.
Camara, W. J., Nathan, J. S., & Puente, A. E. (2000). Psychological test
usage: Implications in professional psychology. Professional Psychol-
ogy: Research and Practice, 31, 141–154.
Collins, R. L., Kashdan, T. B., & Gollnisch, G. (2003). The feasibility of
using cellular phones to collect ecological momentary assessment data:
Application to alcohol consumption. Experimental and Clinical Psycho-
pharmacology, 11, 73–78.
Collins, R. L., Morsheimer, E. T., Shiffman, S., Paty, J. A., Gnys, M., &
Papandonatos, G. D. (1998). Ecological momentary assessment in a
behavioral drinking moderation training program. Experimental and
Clinical Psychopharmacology, 6, 306–315.
Conner, T. (2006). The experience sampling resource page. Retrieved June
9, 2006, from http://psychiatry.uchc.edu/faculty/files/conner/ESM.htm
Conner Christensen, T., Feldman Barrett, L., Bliss-Moreau, E., Lebo, K., &
Kaschub, C. (2003). A practical guide to experience sampling proce-
dures. Journal of Happiness Studies, 4, 53–78.
Conner Christensen, T., Wood, J. V., & Feldman Barrett, L. (2003).
Remembering everyday experience through the prism of self esteem.
Personality and Social Psychology Bulletin, 29, 51–62.
Csikszentmihaly, M., & Larson, R. (1987). Validity and reliability of the
experience-sampling method. Journal of Nervous and Mental Disease,
Delespaul, P. A. E. G. (1992). Technical note: Devices and time-sampling
procedures. In M. deVries (Ed.), The experience of psychopathology:
Investigating mental disorders in their natural settings (pp. 363–373).
New York: Cambridge University Press.
Donner, E. (1992). Expanding the experiential parameters of cognitive
therapy. In M. deVries (Ed.), The experience of psychopathology: In-
vestigating mental disorders in their natural settings (pp. 260–269).
New York: Cambridge University Press.
Gaertner, J., Elsner, F., Pollmann-Dahmen, K., Radbruch, L., & Saba-
towski, R. (2004). Electronic pain diary: A randomized crossover study.
Journal of Pain and Symptom Management, 28, 259–267.
Gilbert, D. G., McClernon, F. J., Rabinovich, N. E., Plath, L. C., Jensen,
R. A., & Meliska, C. J. (1998). Effects of smoking abstinence on mood
and craving in men: Influences of negative-affect-related personality
traits, habitual nicotine intake, and repeated measures. Personality and
Individual Differences, 25, 399–423.
Green, A. S., Rafaeli, E., Bolger, N., Shrout, P. E., & Reis, H. T. (2006).
Paper or plastic? Data equivalence in paper and electronic diaries.
Psychological Methods, 11, 87–105.
Groth-Marnat, G. (1999). Financial efficacy of clinical assessment: Ratio-
nal guidelines and issues for future research. Journal of Clinical Psy-
chology, 55, 813–824.
Hammersley, R. (1994). A digest of memory phenomena for addiction
research. Addiction, 89, 283–293.
Hammond, D., Fong, G. T., Cummings, K. M., & Hyland, A. (2005).
Smoking topography, brand-switching, and nicotine delivery: Results
from an in vivo study. Cancer Epidemiology Biomarkers & Prevention,
Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The treatment utility
of assessment: A functional approach to evaluating assessment quality.
American Psychologist, 42, 963–974.
Haynes, S. N., Leisen, M. B., & Blaine, D. D. (1997). Design of individ-
ualized behavioral treatment programs using functional analytic clinical
case models. Psychological Assessment, 9, 334–348.
Haynes, S. N., & Yoshioka, D. T. (2007). Clinical assessment applications
of ambulatory biosensors. Psychological Assessment, 19, 44–57.
Helzer, J. E., Badger, G. J., Rose, G. L., Mongeon, J. A., & Searles, J. S.
(2002). Decline in alcohol consumption during two years of daily
reporting. Journal of Studies on Alcohol, 63, 551–558.
Hufford, M. R. (in press). Special methodological challenges and oppor-
tunities in ecological momentary assessment. In A. A. Stone, S. Shiff-
man, A. Autienza, & L. Nebeling (Eds.), The science of real-time data
capture: Self-reports in health research. New York: Oxford University
Hufford, M. R., Shields, A. L., Shiffman, S., Paty, J. A., & Balabanis, M.
(2002). Reactivity to ecological momentary assessment: An example
using undergraduate problem drinkers. Psychology of Addictive Behav-
iors, 16, 205–211.
Hufford, M. R., & Shiffman, S. (2003). Patient-reported outcomes: As-
sessment methods. Disease Management & Health Outcomes, 11, 77–
Hyland, M. E., Kenyon, C. A. P., Allen, R., & Howarth, P. (1993). Diary
SPECIAL SECTION: INNOVATIVE ASSESSMENT METHODS
keeping in asthma: Comparison of written and electronic methods.
British Medical Journal, 306, 487–489.
Kanfer, F. H. (1970). Self-monitoring: Methodological limitations and
clinical applications. Journal of Abnormal Psychology, 35, 148–152.
Kihlstrom, J. F., Eich, E., Sandbrand, D., & Tobias, B. A. (2000). Emotion
and memory: Implications for self-report. In A. A. Stone, J. S. Turkkan,
C. A. Bachrach, J. B. Jobe, H. S. Kurtzman, & V. S. Cain (Eds.), The
science of self-report: Implications for research and practice (pp. 81–
99). Mahwah, NJ: Erlbaum.
Lauritsen, K., Degl’Innocenti, A., Hendel, L., Praest, J., Lytje, M. F.,
Clemmensen-Rotne, K., & Wiklund, I. (2004). Symptom recording in a
randomized clinical trial: Paper diaries vs. electronic or telephone data
capture. Controlled Clinical Trials, 25, 585–597.
Lindsley, O. R. (1968). A reliable wrist counter for recording behavior
rates. Journal of Applied Behavior Analysis, 1, 77–78.
Litt, M. D., Cooney, N. L., & Morse, P. (1998). Ecological momentary
assessment (EMA) with treated alcoholics: Methodological problems
and potential solutions. Health Psychology, 17, 48–52.
Mahoney, K. (1974). Count on it: A simple self-monitoring device. Be-
havior Therapy, 5, 701–703.
Mazze, R. S., Shamoon, H., Pasmantier, R., Lucido, D., Murphy, J.,
Hartmann, K., Kuykendall, V., & Lopatin, W. (1984). Reliability of
blood glucose monitoring by subjects with diabetes mellitus. The Amer-
ican Journal of Medicine, 77, 211–217.
McFall, R. M., & Hammen, C. (1971). Motivation, structure, and self-
monitoring: Role of nonspecific factors in smoking reduction. Journal of
Consulting and Clinical Psychology, 37, 80–86.
Mehl, M. R., Pennebaker, J. W., Crow, D. M., Dabbs, J., & Price, J. H.
(2001). The Electronically Activated Recorder (EAR): A device for
sampling naturalistic daily activities and conversations. Behavior Re-
search Methods, Instruments, & Computers, 33, 517–523.
Menon, G., & Yorkston, E. A. (2000). The use of memory and contextual
cues in the formation of behavioral frequency judgments. In A. A. Stone,
J. S. Turkkan, C. A. Bachrach, J. B. Jobe, H. S. Kurtzman, & V. S. Cain
(Eds.), The science of self-report: Implications for research and practice
(pp. 63–79). Mahwah, NJ: Erlbaum.
Moses, E. B., & Barlow, D. B. (2006). A new unified treatment approach
for emotional disorders based on emotion science. Current Directions in
Psychological Science, 15, 146–150.
Mundt, J. C., Ferber, K. L., Rizzo, M., & Greist, J. H. (2001). Computer-
automated dementia screening using a touch-tone telephone. Archives of
Internal Medicine, 161, 2481–2487.
Niaura, R., Abrams, D. B., Shadel, W. G., Rohsenow, D. J., Monti, P. M.,
& Sirota, A. D. (1999). Cue exposure treatment for smoking relapse
prevention: A controlled clinical trial. Addiction, 94, 685–695.
Nosek, B. A., Banaji, M. R., & Greenwald, A. G. (2002). E-research:
Ethics, security, design, and control in psychological research on the
Internet. Journal of Social Issues, 58, 161–176.
Nyholm, D., Kowalski, J., & Aquilonius, S. (2004). Wireless real-time
electronic data capture for self-assessment of motor function and quality
of life in Parkinson’s disease. Movement Disorders, 19, 446–451.
O’Hara, M. W., & Rehm, L. P. (1979). Self-monitoring, activity levels, and
mood in the development and maintenance of depression. Journal of
Abnormal Psychology, 88, 450–453.
Palmblad, M., & Tiplady, B. (2004). Electronic diaries and questionnaires:
Designing user interfaces that are easy for all patients to use. Quality of
Life Research, 13, 1199–1207.
Paty, J. A., Kassel, J., & Shiffman, S. (1992). The importance of assessing
base rates for clinical studies: An example of stimulus control of smok-
ing. In M. deVries (Ed.), The experience of psychopathology: Investi-
gating mental disorders in their natural settings (pp. 347–352). New
York: Cambridge University Press.
Peeters, F., Nicolson, N. A., Berkhof, J., Delespaul, P., & de Vries, M.
(2003). Effects of daily events on mood states in major depressive
disorder. Journal of Abnormal Psychology, 112, 203–211.
Phelps, R., Eisman, E. J., & Kohut, J. (1998). Psychological practice and
managed care: Results of the CAPP Practitioner Survey. Professional
Psychology: Research and Practice, 29, 31–36.
Piotrowski, C. (1999). Assessment practices in the era of managed care:
Current status and future directions. Journal of Clinical Psychology, 55,
Redelmeier, D. A., & Kahneman, D. (1996). Patients’ memories for painful
medical treatments: Real time and retrospective evaluations of two
minimally invasive procedures. Pain, 66, 3–8.
Reis, H. T., & Gable, S. L. (2000). Event sampling and other methods for
studying everyday experience. In H. T. Reis & C. M. Judd (Eds.),
Handbook of research methods in social and personality psychology (pp.
190–222). New York: Cambridge University Press.
Robinson, M. D., & Clore, G. L. (2002). Belief and feeling: Evidence for
an accessibility model of emotional self-report. Psychological Bulletin,
Schmitz, J. M., Sayre, S. L., Stotts, A. L., Rothfleisch, J., & Mooney, M. E.
(2005). Medication compliance during a smoking cessation clinical trial:
A brief intervention using MEMS feedback. Journal of Behavioral
Medicine, 28, 139–147.
Schwartz, J. E., & Stone, A. A. (1998). Strategies for analyzing ecological
momentary assessment data. Health Psychology, 17, 6–16.
Schwarz, N. (1990). Assessing frequency reports of mundane behaviors:
Contributions of cognitive psychology to questionnaire construction. In
C. Hendrick & M. S. Clark (Eds.), Research methods in personality and
social psychology (pp. 98–119). Newbury Park, CA: Sage.
Schwarz, N., & Oyserman, D. (2001). Asking questions about behavior:
Cognition, communication, and questionnaire construction. American
Journal of Evaluation, 22, 127–160.
Scollon, C. N., Kim-Prieto, C., & Diener, E. (2003). Experience sampling:
Promises and pitfalls, strengths and weaknesses. Journal of Happiness
Studies, 4, 5–34.
Sharpe, J. P., & Gilbert, D. G. (1998). Effects of repeated administration of
the Beck Depression Inventory and other measures of negative mood
states. Personality and Individual Differences, 24, 457–463.
Shiffman, S. (1988). Behavioral assessment. In D. M. Donovan & G. A.
Marlatt (Eds.), Assessment of addictive behaviors (pp. 139–188). New
Shiffman, S., Dresler, C. M., Hajek, P., Gilburt, S. J. A., Targett, D. A., &
Strahs, K. R. (2002). Efficacy of a nicotine lozenge for smoking cessa-
tion. Archives of Internal Medicine, 162, 1267–1276.
Shiffman, S., Gwaltney, C. J., Balabanis, M. H., Liu, K. S., Paty, J. A.,
Kassel, J. D., et al. (2002). Immediate antecedents of cigarette smoking:
An analysis from ecological momentary assessment. Journal of Abnor-
mal Psychology, 111, 531–545.
Shiffman, S., Hufford, M., Hickcox, M., Paty, J. A., Gnys, M., & Kassel,
J. D. (1997). Remember that? A comparison of real-time versus retro-
spective recall of smoking lapses. Journal of Consulting and Clinical
Psychology, 65, 292–300.
Shiffman, S., Paty, J. A., Gnys, M., Kassel, J. A., & Hickcox, M. (1996).
First lapses to smoking: Within-subjects analysis of real-time reports.
Journal of Consulting and Clinical Psychology, 64, 366–379.
Shiffman, S., Paty, J. A., Gwaltney, C. J., & Dang, Q. (2004). Immediate
analysis of cigarette smoking: An analysis of unrestricted smoking
patterns. Journal of Abnormal Psychology, 113, 166–171.
Stone, A. A., Broderick, J. E., Schwartz, J. E., Shiffman, S., Litcher-Kelly,
L., & Calvanese, P. (2003). Intensive momentary reporting of pain with
an electronic diary: Reactivity, compliance, and patient satisfaction.
Pain, 104, 343–351.
Stone, A. A., Broderick, J. E., Shiffman, S. S., & Schwartz, J. E. (2004).
Understanding recall of weekly pain from a momentary assessment
PIASECKI, HUFFORD, SOLHAN, AND TRULL
perspective: Absolute agreement, between- and within-person consis- Download full-text
tency, and judged change in weekly pain. Pain, 107, 61–69.
Stone, A. A., & Shiffman, S. (1994). Ecological momentary assessment
(EMA) in behavioral medicine. Annals of Behavioral Medicine, 16,
Stone, A. A., & Shiffman, S. (2002). Capturing momentary self-report
data: A proposal for reporting guidelines. Annals of Behavioral Medi-
cine, 24, 236–243.
Stone, A. A., Shiffman, S., Schwartz, J. E., Broderick, J. E., & Hufford,
M. R. (2002). Patient non-compliance with paper diaries. British Med-
ical Journal, 324, 1193–1194.
Stone, A. A., Shiffman, S., Schwartz, J., Broderick, J. E., & Hufford, M. R.
(2003). Patient compliance with electronic and paper diaries. Controlled
Clinical Trials, 24, 182–199.
Sutton, S. (1993). Is wearing clothes a high risk situation for relapse? The
base rate problem in relapse research. Addiction, 88, 725–727.
Swift, R. M. (2000). Studies of a wearable, electronic, transdermal ethanol
sensor. Alcoholism: Clinical and Experimental Research, 24, 422–423.
Takarangi, M. K. T., Garry, M., & Loftus, E. F. (2006). Dear diary, is
plastic better than paper? I can’t remember: Comment on Green, Rafaeli,
Bolger, Shrout, & Reis (2006). Psychological Methods, 11, 119–122.
Tennen, H., Affleck, G., Coyne, J. C., Larsen, R. J., & DeLongis, A.
(2006). Paper and plastic in daily diary research: Comment on Green,
Rafaeli, Bolger, Shrout, and Reis (2006). Psychological Methods, 11,
Turner, J. A., Mancl, L., & Aaron, L. A. (2005). Brief cognitive-behavioral
therapy for temporomandibular disorder pain: Effects on daily electronic
outcome and process measures. Pain, 117, 377–387.
Vansteelandt, K., Pieters, G., Vandereycken, W., Claes, L., Probst, M., &
Van Mechelen, I. (2004). Hyperactivity in anorexia nervosa: A case
study using experience sampling methodology. Eating Behaviors, 5,
Williams, D. A., Gendreau, M., Hufford, M. R., Groner, K., Gracely, R. H.,
& Clauw, D. J. (2004). Pain assessment in patients with fibromyalgia
syndrome: A consideration of methods for clinical trials. Clinical Jour-
nal of Pain, 20, 348–356.
Wirtz, D., Kruger, J., Scollon, C. N., & Diener, E. (2003). What to do on
spring break? The role of predicted, on-line, and remembered experience
in future choice. Psychological Science, 14, 520–524.
Witkiewitz, K., & Marlatt, G. A. (2004). Relapse prevention for alcohol
and drug problems: That was zen, this is tao. American Psychologist, 59,
Wood, J. M., Garb, H. N., Lillienfeld, S. O., & Nezworski, M. T. (2002).
Clinical assessment. Annual Review of Psychology, 53, 519–543.
Received August 24, 2005
Revision received July 25, 2006
Accepted August 3, 2006 ?
SPECIAL SECTION: INNOVATIVE ASSESSMENT METHODS