Validity of average, minimum, and maximum end-of-day recall assessments of pain and fatigue

Article (PDF Available)inContemporary clinical trials 31(5):483-90 · September 2010with35 Reads
DOI: 10.1016/j.cct.2010.06.004 · Source: PubMed
Abstract
End-of-day (EOD) diary assessments of symptoms have the potential to reduce recall bias associated with longer recall periods, and therefore be useful for generating accurate patient reported outcomes (PROs). In this report we examine the relative validity of diary questions about the experience of daily pain and fatigue, including several questions about experience for the entire day and questions about minimum and maximum daily levels, with previously collected data. Validity estimates are based on comparisons of EOD reports with momentary recordings of pain and fatigue from the same days. One hundred and six participants with rheumatologic diseases yielded 2852 days for analysis. Differences in levels as assessed by EOD and momentary reports were small (just a few points), although in many instances were significantly different. Correlational analyses indicated that "how much," "how intense," and "on average" EOD questions were more strongly associated with momentary reports (rs=0.85-0.90 for pain and 0.81-0.83 for fatigue) than were minimum and maximum questions (rs=0.73-0.80 for pain and 0.67-0.75 for fatigue). Overall, the pain measures had higher EOD-momentary correspondence than the fatigue measures. Analyses of difference scores between EOD and momentary reports confirmed the better correspondence of the average questions compared with minimum and maximum questions. There was little evidence of individual differences in level and correspondence analyses. The implication of these results is that over-the-day diary measures may yield superior PROs than those based on minimum or maximum daily levels.
This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
Validity of average, minimum, and maximum end-of-day recall assessments
of pain and fatigue
Arthur A. Stone
, Joan E. Broderick, Joseph E. Schwartz
Department of Psychiatry and Behavioral Science, Stony Brook University, Stony Brook, NY, USA
article info abstract
Article history:
Received 24 February 2010
Accepted 18 June 2010
End-of-day (EOD) diary assessments of symptoms have the potential to reduce recall bias
associated with longer recall periods, and therefore be useful for generating accurate patient
reported outcomes (PROs). In this report we examine the relative validity of diary questions
about the experience of daily pain and fatigue, including several questions about experience for
the entire day and questions about minimum and maximum daily levels, with previously
collected data [1]. Validity estimates are based on comparisons of EOD reports with momentary
recordings of pain and fatigue from the same days. One hundred and six participants with
rheumatologic diseases yielded 2852 days for analysis. Differences in levels as assessed by EOD
and momentary reports were small (just a few points), although in many instances were
signicantly different. Correlational analyses indicated that how much,”“how intense, and
on average EOD questi ons were more strongly associated with momentary reports
(rs=0.850.90 for pain and 0.810.83 for fatigue) than were minimum and maximum
questions (rs =0.730.80 for pain and 0.670.75 for fatigue). Overall, the pain measures had
higher EOD-momentary correspondence than the fatigue measures. Analyses of difference
scores between EOD and momentary reports conrmed the better correspondence of the
average questions compared with minimum and maximum questions. There was little
evidence of individual differences in level and correspondence analyses. The implication of
these results is that over-the-day diary measures may yield superior PROs than those based on
minimum or maximum daily levels.
© 2010 Elsevier Inc. All rights reserved.
Keywords:
PRO
Daily diary
Momentary measures
Pain
Fatigue
Patient Reported Outcomes (PROs) are patients' self-
reports of their symptoms, the impacts of their symptoms,
and their behaviors. PROs have received considerable
attention because they provide a unique perspective on
patients' health and functioning [2]. One problem with self-
report measures is the length of the recall period [3,4], the
amount of time to be considered when completin g an
assessment. Long recall periods may stretch the ability of
respondents to accurately recall and summarize information,
leading to concerns about accuracy of reports [4].
By limiting the duration of recall period, daily diaries
should reduce recall bias compared to assessments with
longer recall periods (e.g., weeks), and they can be aggregat-
ed over days to cover reporting periods typically used by
retrospective assessments [5]. Diary questions often ask
about the en tire day's symptoms, but can also include
questions about the day's least or lowest level of a symptom
or the day's worst or maximum level. Recently these
alternatives were explicitly suggested in the FDA's PRO
Guidance document [6]. There are two reasons why least/
worst levels may be appealing candidates for assessment: 1)
they avoid the potentially di fcult cognitive process of
summarizing experience and 2) least/worst may be the
construct of interest as opposed to average experience over-
Contemporary Clinical Trials 31 (2010) 483490
Corresponding author. Department of Psychiatry and Behavioral Science,
Putnam Hall, Stony Brook University, Stony Brook, NY 11794-8790 USA. Tel.:
+1 631 632 8833; fax: +1 631 632 3165.
E-mail addresses: Arthur.stone@sunysb.edu (A.A. Stone),
joan.broderick@stonybrook.edu (J.E. Broderick),
Joseph.Schwartz@sunysb.edu (J.E. Schwartz).
1551-7144/$ see front matter © 2010 Elsevier Inc. All rights reserved.
doi:10.1016/j.cct.2010.06.004
Contents lists available at ScienceDirect
Contemporary Clinical Trials
journal homepage: www.elsevier.com/locate/conclintrial
Author's personal copy
the-day. For example, one can hypothesize that treatments
could reduce maximum pain during a day, yet have only a
modest impact on average levels. It is also notable that some
weekly recall questionnaires for pain assessment ask about
least and worst levels (e.g., BPI; [7]), indicating interest in
these constructs.
Some validity data are available for EOD diaries with
ratings of the over-the-day experience, and those results are
encouraging. One study of post-surgical patients compared
EOD recall of daily pain with the average, peak, and last-of-
day variables based on 5 randomly selected momentary
assessments [8]. EOD recall of pain intensity correlated about
0.70 with the average of momentary reports and only 4%
recall bias from peak and end pain was found. Second,
previous results from a subset of the current dataset showed
good correspondence between EOD diaries and momentary
reports for pain and fatigue measures [1]: correlations ranged
from 0.75 to 0.85.
To our knowledge, this is the rstreporttocomparethe
validity of EOD recalled over-the-day, least, and worst pain and
fatigue diary questions with multiple momentary assessments
from the same day. For EOD questions of average and how
much pain/fatigue, we use the average of moments for the
same day as the validity criterion; for the EOD measures of least
pain/fatigue, we use the minimum value of the day's momen-
tary reports; and, for EOD measures of worst pain/fatigue, we
use the maximum value of the day's momentary reports.
Evidence for the validity of EOD measures would be 1) that their
levels are similar to the corresponding momentary measure
and 2) that the correspondence over days between EOD and the
momentary measure was high.
1. Methods
1.1. Participants
Patients were recruited from two ofces of a community
rheumatology practice. Participants were required to be
available for 30 consecutive days and to meet the following
eligibility criteria: 18 years of age; physician-conrmed
diagnosis of a chronic rheumatological illness; experienced
symptoms of pain or fatigue during the last week; no
signicant sight, hearing, or writing impairment; uency in
English; normal sleepwake schedule; ability to come to the
research ofce twice within a month; had not participated in
another electronic diary study in the last 5 years. A total of
279 patients were telephone screened, and 86 (31%) were
excluded due to one or more of the above eligibility criteria.
Of the 193 eligible patients, 76 (39%) declined participation,
and 117 (61%) participated. We examined the demographic
characteristics of those who were eligible and participated
versus those who were eligible and declined participation.
Age, sex, educational achievement, marital status, race, and
reported pain and fatigue at screening were examined by
participation status. A near-signicant difference was found
for age where those who participated (56.3 years) were older
than those who declined participation (52.8 years; t(191) =
1.94, p= 0.053); none of the other comparisons were
signicant. Over the course of the study eleven participants
dropped out, and 106 completed the study. The nal sample
was middle-aged (X= 55.5 years), predominantly female
(91%), white (92%), married (65%), and well-educated (63%
had at least some college).
1.2. Procedure
The study protocol was approved by the Stony Brook
University Institutional Review Board. Participants provided
informed consent and were compensated $100. Data were
collected from September 2005 through June 2006. Eligible
patients came to the research of
ce to complete demographic
and questionnaire measures and to be trained in the use of an
electronic diary (ED). Momentary and daily recall ratings of
pain and fatigue intensity were collected for 2931 days on a
hand-held computer (Palm Zire 31). The ED utilized a software
program provided by invivodata, inc. (Pittsburgh, PA) that
featured auditory tones to signal the participant to complete a
set of momentary ratings. It was programmed to generate an
average of 7 randomly-scheduled (within intervals) prompts
spread across the participant's waking hours (an average of
one every 2 h and 20 min, constrained to ensure a minimum of
30 min between prompts) determined by when the partici-
pant informed the ED that she was going to bed at night and set
the wake up alarm the next morning. In addition to the random
signals, the ED prompted the participant to complete a daily
recall assessment at the time the ED was put to sleep at night,
the End-of-Day assessment. A research assistant telephoned
the patient 24 h after the initial research ofce visit to answer
any questions and troubleshoot potential problems with using
the ED. A follow-up call was made once per week for the
following 3 weeks to ensure the ED was working properly and
to answer any questions. At the end of the month, patients
returned the ED to the research ofce.
1.3. Measures
Items for this study were drawn from the Brief Pain
Inventory (BPI) [9] and the Brief Fatigue Inventory (BFI) [10],
with wordings modied to co rrespond to t he different
reporting periods. Zero to 100-point Visual Analog Scales
were used, but scale endpoints varied according to question
content. For the how much bodily pain question the anchors
were none (0) and very severe (100), whereas for all other
questions the anchors were not at all (0) and extremely
(100). The EOD questionnaire contained several questions that
were used to address the aims of this paper. Three asked about
over-the-day levels of pain: How much bodily pain did you
have?, How intense was your bodily pain?, and What was your
average level of pain today? Another two questions asked
about the lowest (What was the lowest level of your pain
today?) and highest (What was the worst level of your pain
today?) levels of pain for the day. A parallel set of questions
was available for the construct of fatigue/tiredness: How
fatigued (weary, tired) did you feel? and How tired did you
feel? There were also questions about the lowest (What was
the lowest level of your fatigue today?) and highest (What was
the worst level of your fatigue today?) levels of fatigue for the
day. Each of these EOD questions began with the stem
DURING THE DAY. These questions were also asked on a
momentary basis. Each of these included the stem BEFORE
PROMPT. From each of these four momentary questions, the
average, the minimum, and the maximum were derived.
484 A.A. Stone et al. / Contemporary Clinical Trials 31 (2010) 483490
Author's personal copy
Our strategy for determining the validity of the EOD
assessments is to compare data collected at several random
points during each day with the EOD assessments. Momen-
tary reports are thought to be relatively free from distortion
due to recall and, because they are sampled at random points
from the day they provide an unbiased view of average daily
pain and fatigue [11]. Two ways of comparing EOD and
momentary reports are computed. Level differences are
dened as the average difference between EOD and momen-
tary data from the same day; correspondence differences are
dened as the covariation between EOD and momentary data
over days. Level and correspondence are both relevant to
understanding the validity of recall [12]. These analyses were
conducted with data collected in a diary study that examined
several recall periods (1-, 3-, 7- and 28-days [1]). We have
previously reported on the level and correspondence of EOD
average pain/fatigue (based on only 1 week of data per
participant), and those analyses did not examine EOD reports
about worst and least pain/fatigue and did not use the full
4 weeks of data.
2. Results
For the analyses to yield good estimates of level differences
and correspondence, there had to be an adequate number of
momentary assessments each day. The design of the study
specied 7 momentary samples per day, although in practice
this number varied from day to day. It could be greater than 7 if
a person was awake for more than 14 h, or it could be less than
7 if compliance for the day was poor. To balance the goals of
including as many days in the analysis as possible, yet keeping
a reasonable number of assessments, we decided that at least 4
assessments per day would be required. This yielded a sample
of 106 participants with 2852 days of data (with an average of
5.6 reports per day and between 8 and 34 days in the study). In
secondary analyses described below, we tested the possibility
that our estimates of EOD and momentary differences were
affected by the number of assessments per day by conducting
the analyses with participants who had 45 momentary
assessments per day (1406) and those with 6 or more
momentary assessments per day (1446).
For pain, there were 3 EOD variables that were intended to
capture the overall experience of the day: how much bodily
pain, how intense was bodily pain, and what was the average
level of pain intensity. For EOD comparisons with momentary
reports, which had 2 questions (how much pain and how
intense was the pain), we compared 1) EOD how much pain
with the average of momentary how much pain, 2) EOD
how intense was pain with the average of momentary how
intense was pain, and 3) EOD what was your average level of
pain with the average of momentary how much pain (see
Table 1). For the EOD lowest and worst, we examined both
momentary variables, how much pain and pain intensity. The
same strategy for comparing EOD to momentary variables was
employed for the fatigue comparisons.
2.1. Level differences
These analyses are intended to compare the level of pain/
fatigue ratings from EOD measures with the level of the
corresponding momentary measures; for instance, the level
of EOD average pain with the average of momentary pain
assessments, or the level of EOD ratings of least pain with the
minimum of the momentary ratings of pain. Results of these
analyses are shown in
Table 1, where the EOD questions are
presented on the left side of the table and the momentary
variables to the right.
1
Statistical testing of level differences
was done with multilevel modeling in order to control for
differences in the number of momentary reports per
participant and to model the nesting of days within person
so as not to inate the degrees of freedom used for statistical
testing [13].
Table 1
Levels of momentary assessments and end-of-day ratings and signicance of their differences.
Pain
End-of-day Momentary Z-test
(N=2825 days)
How much bodily pain did you have? 50.1 (24.5) Average of How much bodily pain did you have? 45.0 (22.7) 9.61***
How intense was your bodily pain? 47.1 (26.9) Average of How intense was bodily pain? 42.2 (24.2) 11.01***
What was your average level of pain today? 46.1 (22.1) Average of How intense was bodily pain? 42.2 (24.2) 4.66***
What was the lowest level of your pain today? 27.6 (22.5) Computed: Minimum momentary How much bodily pain 32.1 (23.5) 4.23***
Computed: Minimum momentary How intense was bodily pain 29.4 (24.3) 1.45
What was the worst level of your pain today? 61.4 (24.4) Computed: Maximum momentary How much bodily pain 58.8 (23.4) 3.71***
Computed: Maximum momentary How intense was bodily pain 56.1 (25.6) 5.97***
Fatigue
End-of-day Momentary Z-test
How fatigued (weary, tired) did you feel? 51.2 (26.7) How fatigued (weary, tired) did you feel? 47.5 (23.1) 6.80***
How tired did you feel? 52.4 (25.5) How tired did you feel? 47.0 (22.1) 9.52***
What was the least level of your fatigue today? 28.6 (23.0) Computed: Minimum momentary How fatigued 30.9 (24.6) 2.10*
Computed: Minimum momentary How tired 30.2 (23.3) 1.66
What was the worst level of your fatigue today? 61.6 (25.0) Computed: Maximum momentary How fatigued 64.9 (23.7) 4.73***
Computed: Maximum momentary How tired 65.0 (22.9) 4.67***
Note. Testing difference score with multilevel modeling.
*p b 0.05, ***p b 0.001.
1
Means and standard deviations in this table are slightly different than
those found in our earlier paper [1], because that paper used only 1 of the
4 weeks of data used in these analyses.
485A.A. Stone et al. / Contemporary Clinical Trials 31 (2010) 483490
Author's personal copy
EOD reports of average pain and fatigue over-the-day are
rated signicantly higher than the average of momentary
reports of pain and fatigue. On the other hand, EOD reports of
least pain and fatigue are signicantly lower than the
momentary minimums and EOD ratings of worst pain are
higher than the momentary maximums. The exception is that
EOD ratings of worst fatigue were lower than the momentary
maximums. Across the comparisons, the level differences
(averaged across days and people) ranged from 1.6 to 5.4
points on a 101-point scale.
2.2. Correspondence differences
These analyses compare the correspondence or correlation
between EOD and momentary measurements. To estimate the
association between EOD and momentary scores, Pearson
correlation coefcients were computed.
2
It is plausible that
there could be error, especially for the least and most
measures, in cases where there are a limited number of daily
assessments.
3
To test this, the correlations for days that had 4
or 5 momentary assessments (average=4.6) are compared to
those with 6 or more momentary assessments per day
(average=6.6). There were only small differences in the
values between the full sample and the selected samples, thus
indicating no clear advantage for the sample with 6 or more
daily assessments.
The second column of Table 2 presents the correlations
with all days, the third column the correlations for days with
45 moments, and the fourth column for days with 6 or more
moments. For pain, the range of correlations for EOD over-the-
day measures with their momentary counterparts is 0.85 to
0.90; for the EOD minimum with its momentary counterparts
the range is 0.730.74; and, for EOD maximum with its
momentary counterparts the range is 0.780.80. For fatigue,
the range for EOD over-the-day measures with their momen-
tary counterparts is 0.810.83; for the EOD minimum the range
is 0.670.68; and, the EOD maximum 0.720.75. All of these
correlations are statistically signicant.
2.2.1. Discrepancies between EOD and momentary measures
What are a priori acceptable levels of difference for EOD
reports versus their momentary counterparts? Because there
are no standards for guiding the answer to this question, we
evaluate two levels of error for the reader to consider.
Accepting a seemingly small degree of error, we chose a
difference score acceptability range of plus or minus 10 points
on the 101-point VAS scale, and for the wider threshold we
used plus or minus 20 points. The difference was computed
by subtracting the momentary values from the EOD value. For
example, to compute the proportion of 10-point discrepan-
cies for Worst pain versus the maximum momentary value
for the day, a difference was said to exist if the absolute
difference of (Worst PainMaximum Momentary Pain) was
greater than or equal to 10. The percentages of days that had
these levels of difference are shown in Table 2. There is more
error for EOD least and worst pain and fatigue relative to
error rates for the EOD average and usual scores. Yet another
way to examine this data is by histograms of the differences
between the EOD variables and momentary variables, as
shown in Fig. 1 for EOD average pain (upper panel) and for
how fatigued (lower panel) (other pain measures and
fatigue measures not shown). A much narrower distribution
of errors for the over-the-day variables compared with
minimum and maximum variables is evident.
2.3. Individual differences
4
It is plausible that there are individual differences in level
and correspondence between EOD variables and their
corresponding momentary-based variables. Given the poten-
tially large number of comparisons and the secondary nature
of these analyses, we examined individual differences in a
subset of the data. One pain variable (Pain Intensity) and one
fatigue variable (Fatigue Intensity) were chosen to represent
the two content domains. The individual difference variables
selected were also a subset of all possible variables; they were
age (less than 57 years [n= 52] vs. greater than or equal to
57 years [n =54]), educational level (Some college or less
[n= 60] vs. College graduate or more [n =45]), overall health
status based on SF-36 global health question (Poor or Fair
health [n=38] vs. Good, Very Good or Excellent health
[n= 68]), average level of EMA pain (for the pain variables
only, Low [n =53] vs. High [n= 53], based on median split),
and average level of EMA fatigue (for the fatigue variables
only, Low [n =53] vs. High [n= 53], based on median split).
Gender was not considered because there were so few men in
the sample.
To examine individual differences in level, new person-
level variables were computed to represent the difference
between EOD and momentary variables; this was done for
daily average, daily minimum, and daily maximum for both
pain and fatigue intensity, yielding 6 variables. To test
individual differences in level, a comparison of the mean
difference scores between EOD and EMA between the two
groups dened by each individual attribute was computed. To
examine individual differences in correspondence, the corre-
lation between EOD variables and their corresponding
momentary-based variables were computed separately for
each group and the difference tested; for exampl e, the
correlations between EOD pain and momentary pain were
compared for older and younger persons (the individual
difference variable). To be consistent with the set of
correspondence analyses shown earlier, the unit of analysis
for these comparisons was a person-day.
2.3.1. Individual differences in level
Table 3 presents the EODEMA difference scores for each
level of the individual difference variables (column headings)
2
An alternative method of examining correspondence is to compute
within-subject pooled correlations, which center each individual's scores
around their own mean, eliminating any between-person effects from the
correlations. Such correlations were slightly lower than those presented in
Tables 2 and 4, but the pattern of the correlations was the same as the raw
correlations, therefore, only the raw correlations are presented.
3
Theoretically, the likelihood of actually capturing the least or worst pain
during the day should increase as a function of the number of assessments
per day, with more assessments resulting in more accurate (less error)
estimates of minimum and maximum.
4
We thank an astute reviewer for suggesting this exploration of individual
differences.
486 A.A. Stone et al. / Contemporary Clinical Trials 31 (2010) 483490
Author's personal copy
with signicance level of t-tests. A total of 24 t-tests were
computed and 4 reached a signicance level of 0.01 or
greater. All signicant differences at this alpha level were
found for the test of EOD least pain or fatigue versus EMA
minimum pain or fatigue. However, these tests were spread
across the ve individual difference variables, suggesting that
none of the individual difference variables was consistently
associated with level differences.
2.3.2. Individual differences in correspondence
Table 4 presents correlations between EOD and EMA
measures separately for the mean, minimum, and maximum
variables for each subgroup dened by the individual difference
variables. We examined the table for major differences in
correlations for the low and high level of the individual dif-
ference variables; no difference or a small difference would
indicate no evidence for individual differences whereas major
differences would indicate individual differences. There were
24 comparisons and most differences between correlations
were 0.05 or smaller. The largest difference was 0.12 for the Age
variable with the Minimum variables and only a total of three
correlation comparisons were discrepant by 0.10 or more. This
suggests that individual differences play only a minor role in
EODEMA correspondence.
3. Discussion
The goal of this report was to provide empirical evidence
pertaining to the validity of end-of-day reports of pain and
fatigue (over-the-day, least, and worst) that are used by
outcomes researchers. EOD diaries are an expedient means of
assessing overall or average levels of an outcome while trying
to minimize recall bias. The researcher has the option of
averaging as many daily assessments as needed for their
Table 2
Correlations of EOD assessments with the corresponding momentary measure. Unit of analysis is a person-day.
Correlation between
EOD and momentary
N=2825 days
Correlation between
EOD and EMA (45
moments/day;
mean=4.6)
N=1384 days
Correlation between EOD
and EMA (minimum of 6
moments/day; mean=6.6)
N=1441 days
Proportion of all days
with a difference of greater
than 10 points (on 0100
point scale)
Proportion of all days
with a difference of greater
than 20 points (on 0100
point scale)
Pain
EOD: How much
Moment: Mean
how much
0.88 0.88 0.88 36% 11%
EOD: How intense
Moment: Mean
how intense
0.90 0.90 0.90 33% 12%
EOD: Average
Moment: Mean
How much
0.85 0.84 0.85 36% 11%
EOD: Lowest level
Moment: Min
how much
0.74 0.76 0.73 45% 21%
EOD: Lowest level
Moment: Min
how intense
0.73 0.75 0.72 46% 22%
EOD: Worst
Moment: Max
how much
0.80 0.80 0.80 44% 17%
EOD: Worst
Moment: Max
how intense
0.78 0.78 0.78 47% 21%
Fatigue
EOD: How much
Moment:
Mean fatigued
0.83 0.84 0.82 41% 16%
EOD: How much
Moment:
Mean tired
0.81 0.83 0.78 43% 18%
EOD: Lowest
fatigued Moment:
Min fatigued
0.67 0.69 0.66 50% 25%
EOD: Lowest
fatigued Moment:
Min tired
0.68 0.70 0.67 50% 24%
EOD: Worst
fatigued Moment:
Max fatigued
0.75 0.76 0.73 46% 21%
EOD: Worst
fatigued Moment:
Max tired
0.72 0.75 0.69 48% 22%
487A.A. Stone et al. / Contemporary Clinical Trials 31 (2010) 483490
Author's personal copy
Fig. 1. Histograms of difference scores between EOD and momentary variables for how much pain, lowest pain, and worst pain (upper panels) and how fatigued, least fatigue, and worst fatigue (lower panels).
488 A.A. Stone et al. / Contemporary Clinical Trials 31 (2010) 483490
Author's personal copy
research purposes (e.g., over 7 days to characterize the
outcome for a week). The FDA's PRO Guidance encourages
the use of diaries and brief recall periods, and it mentions using
diaries to collect information about worst and least daily
outcomes [6]. This paper reports the rst comparison of the
validity of EOD worst and least assessments based on
comparisons with momentary reports from the same day.
Over all comparisons for both pain and fatigue, EOD levels
were within a few points of their momentary counterparts.
The largest mean differences were about 5 points; and
although most were statistically different, they will probably
be viewed as minor from a clinical point of view. These
discrepancies are less than we have found when comparing
7-day recall of average pain and fatigue to aggregated
momentary reports, where the mean discrepancies were as
large as 15 points [1]. Thus, on average, EOD reports closely
reect the average, least, and worst levels of pain and fatigue
as measured by momentary reports for a reporting period of a
single day. Importantly, they appear to introduce less recall
bias than recall ratings using a 7-day reporting period.
However, on an individual-day basis, we observed many
discrepancies of 10, 20, and more points.
Results for correspondence yielded distinct patterns be-
tween the over-the-day versus least and worst measures;
there was also an overall pattern observed for pain measures
versus fatigue measures. Correspondence with their momen-
tary counterparts were higher for the EOD over-the-day
measures (How much,”“How intense, and Average
questions) than for the least and worst measures: from
10% to 20% more variance was shared between EOD and
momentary variables in the over-the-day versus the least or
worst measures. We view these as signicant differences with
the results suggesting that EOD ratings of How much or
How intense or On average for daily pain or fatigue are
more valid EOD measures than EOD ratings of least or
worst pain or fatigue. This result was conrmed with the
analyses of discrepancy scores that were based on two
thresholds for dening differences between EOD and momen-
tary measures. Considerably lower rates of large discrepan-
cies were found for EOD over-the-day measures compared
with least and worst EOD measures. Another general observa-
tion was that pain measures were more closely associated with
momentary assessments than were fatigue measures, which
was also the case in our previous comparison of weekly recall
to average momentary experience [1]. These correlations are
higher than we have found when comparing 7-day recall of
average pain and fatigue and aggregated momentary reports,
which is likely the result of the shorter recall period in this
study [1]. We do not have a satisfactory explanation for this
nding, but it may be that pain is a relatively more distinct
state than fatigue and, hence, is recalled more accurately even
within a period as short as a day.
There was the possibility that there were individual
differences in discrepancies in end-of-day versus momentary
measures of mean, minimum, and maximum p ain and
fatigue. We examined this possibility in a subset of the pain
and fatigue measures and with a set of ve factors that could
moderate level or correspondence differences. Only a few
signicant effects were observed for the individual differ-
ences variables and they were spread over the individual
differences variables, suggesting either none or only a minor
effect of the variables. Of course, variables that were not
examined could have individual difference effects, but this
seems less likely given the current ndings.
Table 3
Individual differences in Level (Person is unit of analysis; n = 106).
Age Education Health Pain level Fatigue level
Low High Low High Low High Low High Low High
Pain
Mean difference 5.3 5.0 5.2 4.8 5.2 5.1 5.9 4.4 4.9 5.4
Minimum difference 3.7 5.0 0.6 9.1*** 3.8 4.7 0.3 9.1*** 4.9 3.8
Maximum difference 2.7 2.8 2.5 3.0 2.8 2.7 4.5 0.9*
Fatigue
Mean difference 2.9 4.3 3.6 3.3 3.2 3.8 3.0 4.2 3.4 3.8
Minimum difference 5.5 1.1** 1.0 3.4 2.0 2.3 0.2 4.1 2.5 6.8***
Maximum difference 3.5 3.5 4.8 1.9 2.8 3.9 3.3 3.7
*p b 0.05 **p b 0.01 ***p b 0.001.
Table 4
Correspondence differences by individual differences (day is unit of analysis; n= 2825).
Age Education Health Pain level Fatigue level
Low High Low High Low High Low High Low High
Pain
Mean 0.88 0.88 0.89 0.87 0.83 0.89 0.78 0.80
Minimum 0.79 0.67 0.80 0.71 0.68 0.75 0.63 0.61
Maximum 0.80 0.79 0.81 0.77 0.70 0.81 0.71 0.68
Fatigue
Mean 0.82 0.82 0.83 0.84 0.80 0.83 0.71 0.71
Minimum 0.69 0.64 0.68 0.66 0.62 0.68 0.49 0.53
Maximum 0.73 0.74 0.76 0.73 0.70 0.76 0.68 0.58
489A.A. Stone et al. / Contemporary Clinical Trials 31 (2010) 483490
Author's personal copy
There are several limitations that must be considered in
the interpretation of these results. One is that the compar-
isons of EOD minimums and maximums with the momentary
assessments were based on a limited number of momentary
assessments. It is possible that the momentary assessments
missed the highest and lowest symptom levels, thus not
capturing the true minimum and maximum levels. Two
aspects of the results suggest this was not a major factor in
the analysis. The rst is that the means of recalled maximums
or minimums were not very different from their momentary
assessment counterparts, though they were mostly in the
predicted direction. If momentary assessments had regularly
missed true maximums or minimums, then we would have
expected a larger difference between recall and momentary
averages, with much larger recalled maximums and much
lower recalled minimums. But this was not the case. One
possible explanation is that periods of low or high pain/
fatigue typically last for several hours and not just for brief
intervals, thus being captured by at least one momentary
assessment. The second nding that suggests that biasing due
to daily assessments was not large were the analyses showing
that increasing the number of daily momentary assessments
from 4 to 5 per day to 6 or more per day had very little impact
on the results, reducing but not eliminating our concern
about the coverage of momentary assessments. Finally, there
is also the possibility that our estimates of level and
correspondence differences are biased upward because
participants' EOD evaluations may have been more accurate
due to the momentary recording they did throughout the day.
That is, the EMA component of the design may have enhanced
their memories of pain and fatigue at the end of the day.
If the intention of a study is to detect changes in least pain
or worst pain experienced during the day, then it would make
little sense to use the more valid over-the-day measure since
they assess a different construct. However, if there was
exibility in the choice of outcome measures for a trial or if it
was unclear which outcomes would be affected by treatment,
then we believe the results reported here indicate that over-
the-day measures have better psychometric properties than
measures of least and worst pain and fatigue.
In summary, this study suggests that end-of-day reports of
over-the-day pain and fatigue were strongly associated with
momentary assessments of the same, justifying their use as
PROs. We are less sanguine about the use of Worst and
Least given the we aker associations with momentary
assessments, but they did have substantial correlations with
momentary reports, which may also justify their use.
Acknowledgments
This work was supported by a grant from the National
Institutes of Health (1 U01-AR052170-01; Arthur A. Stone,
principal investigator) and by GCRC grant no. M01-RR10710
from the National Center for Research Resources. A.A.S. is a
Senior Consultant for invivodata, inc., and a Senior Scientist
with the Gallup Organization.
References
[1] Broderick JE, Schwartz JE, Vikingstad G, Pribbernow M, Grossman S,
Stone AA. The accuracy of pain and fatigue items across different
reporting periods. Pain Sep 30 2008;139(1):14657.
[2] Burke LB, Kennedy DL, Miskala PH, Papadopoulos EJ, Trentacosti AM.
The use of patient-reported outcome measures in the evaluation of
medical products for regulatory approval. Clin Pharmacol Ther 2008;84:
2813.
[3] Bradburn NM, Rips LJ, Shevell SK. Answering autobiographical ques-
tions: the impact of memory and inference on surveys. Science
1987;236:15167.
[4] Gorin AA, Stone AA. Recall biases and cognitive errors in retrospective
self-reports: a call for momentary assessments. In: Baum A, Revenson T,
Singer J, editors. Handbook of Health Psychology. Mahwa h, N.J.:
Erlbaum; 2001. p. 40514.
[5] Bolger N, Davis A, Rafaeli E. Diary methods: capturing life as it is lived.
Annu Rev Psychol 2003;54:579616.
[6] Guidance for Industry. Patient-reported outcome measures: Use in
medical product development to support labeling claims. 2009; http://
www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInfor-
mation/Guidances/UCM193282.pdf.
[7] Cleeland C. Pain assessment: global use of the Brief Pain Inventory. Ann
Acad Med Singapore 1994;23:12938.
[8] Jensen MP, Mardekian J, Lakshminarayanan M, Boye ME. Validity of 24-
h recall ratings of pain severity: biasing effects of Peak and End pain.
Pain 2008;137:4227.
[9] Daut RL, Cleeland CS. The prevalence and severity of pain in cancer.
Cancer Nov 1 1982;50(9):19138.
[10] Mendoza TR, Wang XS, Cleeland CS, Morrissey H, Johnson BA, Wendt JK,
et al. The rapid assessment of fatigue severity in cancer patients - Use of
the brief fatigue inventory. [Article]. Cancer 1999;85(5):118696.
[11] Stone AA. The science of real-time data capture : self-reports in health
research. Oxford; New York: Oxford University Press; 2007.
[12] Stone AA, Broderick JE, Schwartz JE, Shiffman S, Litcher-Kelly L,
Calvanese P. Intensive momentary reporting of pain with an electronic
diary: reactivity, compliance, and patient satisfaction. Pain Jul 2003;104
(12):34351.
[13] Schwartz JE, Stone AA. Data analysis for EMA studies. Health Psychol
1998;17:616.
490 A.A. Stone et al. / Contemporary Clinical Trials 31 (2010) 483490
    • "It could be argued that a shorter recall period, such as 24 hours, could yield more accurate data with less memoryrelated bias than a 1-week recall period. There is a growing body of research comparing various recall periods, and these studies do not consistently support one recall period over another51525354555657. Therefore, recall periods for PRO measures need to be selected based on the content, patient population, and purpose of each individual instrument. "
    [Show abstract] [Hide abstract] ABSTRACT: Fatigue is one of the most common symptoms of major depressive disorder (MDD). The Fatigue Associated with Depression Questionnaire (FAsD) was developed to assess fatigue and its impact in patients with MDD. The current article presents the qualitative research conducted to develop and examine the content validity of the FAsD and FASD-Version 2 (FAsD-V2). Three phases of qualitative research were conducted with patients recruited from a geographically diverse range of clinics in the US. Phase I included concept elicitation focus groups, followed by cognitive interviews. Phase II employed similar techniques in a more targeted sample. Phase III included cognitive interviews to examine whether minor edits made after Phase II altered comprehensibility of the instrument. Concept elicitation focused on patients' perceptions of fatigue and its impact. Cognitive interviews focused on comprehension, clarity, relevance, and comprehensiveness of the instrument. Data were collected using semi-structured discussion guides. Thematic analyses were conducted and saturation was examined. A total of 98 patients with MDD were included. Patients' statements during concept elicitation in phases I and II supported item development and content. Cognitive interviews supported the relevance of the instrument in the target population, and patients consistently demonstrated a good understanding of the instructions, items, response options, and recall period. Minor changes to instructions for the FAsD-V2 did not affect interpretation of the instrument. This qualitative research supports the content validity of the FAsD and FAsD-V2. These results add to previous quantitative psychometric analysis suggesting the FAsD-V2 is a useful tool for assessing fatigue and its impact in patients with MDD.
    Full-text · Article · Jan 2015
  • [Show abstract] [Hide abstract] ABSTRACT: Esophageal symptoms often co-occur. A validated self-report measure encompassing multiple esophageal symptoms is necessary to determine their frequency and severity both independently and in association with each other. Such a questionnaire could streamline the diagnostic process and guide patient management. We aimed to develop an integrative measure that provides a clinical 'snapshot' of common esophageal symptoms. Internal reliability and content validity of a 38-item self-report Esophageal Symptoms Questionnaire (ESQ), measuring the frequency and severity of typical esophageal symptoms using Likert-rating scales were assessed in 211 patients presenting to gastroenterology and ENT outpatient tertiary care clinics. Reproducibility, concurrent and predictive validity were evaluated using the reduced-item ESQ. The 38-item ESQ had high internal reliability. Principal component analyses and item reduction methods identified three components, to which 30 of 38 items contributed significantly, providing 59% of total variance. The test-retest correlations were moderate-to-strong for 24 of 30 new items (r(s) ≥ 0.44, P < 0.05). The resultant subscales measuring dysphagia (ESQ-D), globus (ESQ-G), and reflux (ESQ-R) compared well against concurrent physician's 'working' diagnosis (odds ratio 1.04-1.09). The receiver operating characteristics were adequate-to-good for ESQ-D (area under the curve [AUC]= 0.87) and ESQ-G (AUC = 0.74), but poor for ESQ-R (AUC = 0.61) although it matched the content of the validated Reflux Disease Questionnaire. The brief 30-item ESQ shows good internal reliability and content validity as a summary of the extent of dysphagia, globus and reflux symptoms. As a tool measuring more than one esophageal symptom, ESQ could guide patient management by indicating which of the coexisting symptoms needs to be addressed first.
    Article · May 2011
  • [Show abstract] [Hide abstract] ABSTRACT: We propose several different patient-reported outcomes (PROs) from momentary, real-time collection of symptom data. In addition to the mean of momentary reports of symptoms, other types of summaries can reflect different aspects of the symptom experience. With secondary analyses of two studies of patients with chronic pain assessed with real-time methods, we demonstrate principles for developing outcomes that summarize symptom experience during a 1-week period. These studies focused on pain intensity, which is used to demonstrate methods for creating summary momentary measures. Analyses from the first study (Pain 2008;139:146-57) yielded outcome measures based on the mean, median, 90th percentile, maximum, standard deviation, proportion of reports with no pain, proportion of reports with pain more than 50 (on a 0- to 100-point scale), and time-contingent measures. The second study examined the performance of these measures (and the mean) in a longitudinal study, in which some patients changed treatment (n = 78), making pain reduction likely, whereas others had no treatment change (n = 27). The measure that best discriminated the groups was the proportion of momentary reports without pain (effect size = 0.50), closely followed by the mean of all reports (effect size = 0.45). Most measures also correlated with patients' global impression of their change (between 0.39 and 0.55, except for standard deviation [0.13]). These analyses suggest that momentary symptom data can be useful for developing new PROs that reflect symptom experience other than the mean. They highlight knowledge gleaned from real-time studies, which deepens our understanding of symptoms by demonstrating which changes in symptoms are associated with overall perceived change.
    Full-text · Article · May 2012
Show more