Pittsburgh and Epworth Sleep Scale Items: Accuracy of Ratings Across Different Reporting Periods.
ABSTRACT This study examined the ecological validity of sleep experience reports across different lengths of reporting periods. The accuracy of item responses on the Pittsburgh Sleep Quality Index (PSQI) and Epworth Sleepiness Scale (ESS) across 3-, 7-, and 28-day reporting periods was examined in relation to electronic daily item ratings. Primary care clinic patients (N = 119) were recruited, and were not required to have sleep problems to participate. Analyses found few differences in item scores when electronic daily ratings were compared with recall ratings, regardless of the length of the reporting period. However, within-subjects analyses indicated low levels of accuracy in recall of sleep items for specific days in the last week. Thus, for the purpose of between-subject comparisons, patients generally can provide accurate recall of sleep experiences; studies requiring finer-grained analysis across time and within-subjects require daily diary methodology.
- [Show abstract] [Hide abstract]
ABSTRACT: To assess the influence of craniopharyngioma or consequent surgery on melatonin secretion, and the association with fatigue, sleepiness, sleep pattern and -quality. Cross-sectional study. Fifteen craniopharyngioma patients were individually matched to healthy controls. 24h-salivary melatonin and cortisol were measured. Sleep-wake patterns were characterized by actigraphy and sleep diaries recorded for two weeks. Sleepiness, fatigue, sleep quality, and general health were assessed by Multidimensional Fatigue Inventory, Pittsburgh Sleep Quality Index, Epworth Sleepiness Score, and Short Form-36. Patients had increased mental fatigue, daytime dysfunction, sleep latency, lower general health (all, p≤0.05), and tended to have increased daytime sleepiness, general fatigue, and impaired sleep quality compared to controls. The degree of hypothalamic injury was associated with increased body mass index (BMI) and lower mental health (p=0.01). High BMI was associated with increased daytime sleepiness, daytime dysfunction, mental fatigue, and lower mental health (all, p≤0.01). Low midnight melatonin was associated with reduced sleep time and efficiency (p≤0.03) and a tendency to increased sleepiness, impaired sleep quality, and physical health. Midnight melatonin remained independently related to sleep time after adjustment for cortisol. Three different patterns of melatonin profiles were observed; normal (n=6), absent midnight peak (n=6), and phase shifted peak (n=2). Only patients with absent midnight peak had impaired sleep quality, increased daytime sleepiness, general and mental fatigue. Craniopharyngioma patients present changes in circadian pattern and daytime symptoms, which may be due to the influence of the craniopharyngioma or its treatment on the hypothalamic circadian and sleep regulatory nuclei.European Journal of Endocrinology 03/2014; · 3.69 Impact Factor
- Value in Health 09/2014; · 2.89 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: The identification of the gene mutation causing Huntington disease has raised hopes for new treatments to ease symptoms and slow functional decline. As such, there has been a push towards designing efficient pharmacological trials (i.e., drug trials), especially with regard to selecting outcomes measures that are both brief and sensitive to changes across the course of the disease, from subtle prodromal changes, to more severe end-stage changes.Journal of Huntington's disease. 01/2014; 3(3):233-52.
Behavioral?? Sleep?? Medicine,?? in?? press??
5?? November?? 2011??
Pittsburgh and Epworth Sleep Scale Items:
Accuracy of ratings across different reporting periods
Joan E. Broderick, Ph.D.,
Doerte U. Junghaenel, Ph.D., Stefan Schneider, Ph.D.,
John J. Pilosi, B.A., Arthur A. Stone, Ph.D.
Department of Psychiatry & Behavioral Science
Stony Brook University
Joan E. Broderick, Ph.D.
Department of Psychiatry and Behavioral Science
Putnam Hall, South Campus
Stony Brook University
Stony Brook, NY 11794-8790
Key words: sleep, measurement, validity, diary, patient reported outcomes
ED= electronic diary
ESS=Epworth Sleepiness Scale
IVR=Interactive Voice Recording
PRO=Patient reported outcome
PSQI=Pittsburgh Sleep Quality Index
This study examined the ecological validity of sleep experience reports across different lengths
of reporting periods. The accuracy of item responses on the Pittsburgh Sleep Quality Index
(PSQI) and Epworth Sleepiness Scale (ESS) across 3, 7, and 28-day reporting periods was
examined in relation to electronic daily item ratings. Primary care clinic patients (N=119) were
recruited and were not required to have sleep problems to participate. Analyses found few
differences in item scores when electronic daily ratings were compared with recall ratings
regardless of the length of the reporting period. However, within-subject analyses indicated low
levels of accuracy in recall of sleep items for specific days in the last week. Thus, for the
purpose of between-subject comparisons, patients generally can provide accurate recall of
sleep experiences; studies requiring finer grained analysis across time within subjects will
require daily diary methodology.
In 2004 the National Institutes of Health awarded multi-site collaboration grants to
improve the measurement of patient reported outcomes (PROs) for clinical trials (see
http://www.nihpromis.org/default.aspx). One component of this initiative has been to examine
the ecological validity of PROs, that is, the level of association of real-time PROs with recall (A.
Stone & Shiffman, 1994). Our first studies focused on pain, fatigue, and interference with daily
functioning in patients with chronic illness (Broderick, Schneider, Schwartz, & Stone, 2010;
Broderick et al., 2008). These and other studies demonstrated some discrepancies between
ratings based on real-time assessment and recall ratings of a week or more (Salovey, Smith,
Turk, Jobe, & Willis, 1993). Here, we extend this work by looking at sleep PROs. Approximately
30-50% of the general population report insomnia, daytime sleepiness, or other sleep
difficulties, yet the problems are often clinically overlooked (Buysse et al., 2008; Hossain &
Shapiro, 2002). The current study was designed to examine the ecological validity of the
Pittsburgh Sleep Quality Index (PSQI) and the Epworth Sleepiness Scale (ESS) items in a
sample of patients attending a primary care clinic.
This study examined the ecological validity for each item on the PSQI and ESS across
three different reporting periods: 3 days, 7 days, and 28 days. Ecological validity reflects the
degree to which a measure is a true index of experience in the respondent’s daily life (A. Stone
& Shiffman, 1994). Daily reports on a hand-held computer provided ratings less subject to
memory loss and recall bias (Stone, Shiffman, Schwartz, Broderick, & Hufford, 2003). These
electronic daily reports were compared with recall ratings of items referencing the three different
reporting periods during a month of daily ratings. Based on results from prior work, we
hypothesized (1) that patients on average would over-report nighttime sleep problems and
daytime sleepiness problems on the recall ratings relative to aggregated daily reports, (2) that
the amount of over-reporting of problems would be greater for longer than for shorter recall-
periods, and (3) that the correlation between aggregated daily reports and recall reports would
differ across reporting periods, in that the correlation would be greater for shorter recall periods
than for longer recall periods. We also examined the accuracy of 7 day-by-day recall ratings on
4 items made during the final study visit. These data will help to determine the optimal reporting
period for accuracy in the reporting of these PRO domains.
Patients were recruited from the Primary Care Clinic at Stony Brook University Medical
Center. The research staff approached patients in the waiting room, and those who were
interested could provide contact information, and were screened on the phone for eligibility.
Eligibility criteria were: greater than age 18, fluent in English, no visual or hearing impairment,
no difficulty holding a pen or writing, typically awake before 10 AM and asleep after 7PM, no
night shift job that leads to daytime sleep, no substance abuse or cognitive deficits, and able to
travel to the research laboratory two times in a month. Patients were included in the study
without regard to sleep disorders or other diseases. This approach was used to ensure a high
degree of variability in responses to PSQI and ESS items needed for generalizable results.
The Stony Brook University Institutional Review Board approved the study (Approval
#6845). During their first laboratory visit, patients gave written informed consent and were
trained in the use of the electronic diary (ED) to record daily completion of PSIQ and ESS items.
The ED is a Palm Pilot computer that used the open-source Experience Sampling Program
(ESP; http://www.experience-sampling.org) to capture symptom ratings. The software records
the time and date of each entry. A 24-hour and four weekly follow-up phone calls were
conducted to ensure that electronic recording was going well. During the next month,
participants completed both morning (PSQI items) and evening (PSQI and ESS items) reports
on the ED shortly after waking and shortly before going to sleep. The ED had an alarm feature
that alerted participants to complete their daily ratings. After 30 to 36 days, patients returned to
the laboratory with their ED and completed additional questionnaires about their health and
sleep quality. During this visit, as a further probe of recall accuracy, participants were presented
with 7 recall cards with 4 items for each for the previous 7 days. This would enable assessment
of actual memory accuracy across 7 previous days.
In addition to the daily ED ratings, recall ratings for the three different reporting periods
were made with an Interactive Voice Recording (IVR) system. At the laboratory visit, participants
were provided with five numerically labeled envelopes holding the recall questionnaires.
Participants were informed that they would be telephoned by the IVR system on five randomly
selected nights during the month of the study. On those nights, they would be informed which
recall questionnaire they should complete (3-, 7-, or 28-day). In order to avoid anticipatory
monitoring of sleep quality and daytime fatigue, participants were not informed of the dates they
would be contacted to complete the recall questionnaires via IVR. When called, they were
instructed to record their responses on the paper questionnaire, and then to call back and
record their responses via IVR. This procedure allowed time and date stamping of the recall
ratings. To enhance data quality, patients were instructed to place each completed paper
questionnaire into the mail the next day to permit comparison with IVR entries in the event of
missing entries or outliers. If a participant missed their scheduled call on a particular evening, a
research assistant called the participant the next morning to inquire about reasons for and/or
difficulties in completing the ratings. Another attempt to reach the participant was made by the
IVR system the subsequent evening without prior knowledge of the participant.
Each participant was randomly assigned to 1 of 6 different recall schedules that
specified when, within the 30-36 day study period, the two 3-day and the two 7-day recall
assessments would take place. The schedules were designed with no overlap of days for the
two 3-day and two 7-day recall periods, and one of each of the 3-day and 7-day recalls took
place during the week and the others on the weekend. The one 28-day recall was fixed at the
end of the study for all participants (and overlapped with the other recall periods). Participants’
length of participation varied between 30 and 36 days based on the specific IVR calling
schedule to which they were randomized. Participants were compensated $125 for full
completion of this research study. ??
The PSQI and the ESS are two very widely used instruments for measurement of sleep
dysfunction. The PSQI is a self-report global measure of sleep (Buysse, Reynolds, Monk,
Berman, & Kupfer, 1989). The reporting period is one month, and scores >5 on the global scale
indicate clinically meaningful sleep disturbance. Test-retest reliability (46 days) for the global
score is good (r= .86); although for some of the subscales, it can be substantially less (e.g., r=
.23 for sleep quality) (Backhaus, Junghanns, Broocks, Riemann, & Hohagen, 2002; Buysse, et
al., 1989). The PSQI has demonstrated the ability to differentiate among a number of patient
populations with varying sleep quality, convergent and discriminant construct validity (Carpenter
& Andrykowski, 1998), and responsivity to treatment to improve sleep (Krakow et al., 2004).
Correlations between paper sleep diaries and PSQI sleep duration reports (r=.81) and sleep
onset latency reports (r=.71) are good, however, the diaries indicate longer sleep duration than
PSQI and shorter sleep onset latency (Backhaus, et al., 2002; Sharkey et al., 2010).
The ESS measures self-reported excessive daytime sleepiness (Johns, 1991). It has
been used to probe for indicators of a variety of sleep disorders including obstructive sleep
apnea, insomnia, and narcolepsy. Higher scores indicate greater levels of daytime sleepiness,
and scores >10 indicate excessive daytime sleepiness. No reporting period is specified.
Correspondence between ESS scores and polysomnography, respiratory disturbance index,
and the apnea-hypopnea index are often low suggesting that the measure is not a sensitive
measure of objectively measured sleep disturbance (Kaminska et al., 2010; Sauter et al., 2000)
and correspondence with the Multiple Sleep Latency Index is also generally modest (Johns,
2000). Nevertheless, it has been found to be stable across repeated measures (7 months; ICC
= .87) and sensitive to treatment for apnea (Kaminska, et al., 2010; Massie & Hart, 2003).
Patients completed the standard PSQI and ESS scales during their first laboratory visit.
For daily ED ratings and IVR recall ratings, the PSQI items were slightly modified to
accommodate the recall period of the ratings. For example, the PSQI item, “During the past
month, how long (in minutes) has it usually taken you to fall asleep each night? # of minutes
___” was presented on the ED as “Last night, it took me about __ minutes to fall asleep.”
Likewise, the IVR item was “During the past (3, 7, or 28 days), how long (in minutes) has it
usually taken you to fall asleep each night?” The same strategy was applied to ESS items. The
ESS instructs respondents to rate each activity on a 0-3 scale for the likelihood of dozing off
while doing the activity. Thus, we wanted to look at how often the patient reported sleepiness
when in the ESS activities. For our ED and IVR ratings, we asked two questions. For the ED we
asked, “At any time today, were you sitting and reading?” (yes or no). For patients who
responded “yes,” it was followed by “While you were sitting and reading, did you doze off or fall
asleep?” (yes or no). Similarly, for the IVR ratings, we asked “During the past (3, 7, or 28) days,
on how many days were you sitting and reading? __# of days.” This was followed by “On how
many of those days did you doze off or fall asleep while sitting and reading? __ # days” (details
available from authors).
At the last laboratory visit, patients made recall ratings for 4 items for each of the last 7
days: (1) what time the participant went to bed, (2) if the participant had trouble sleeping
because he/she could not get to sleep within 30 minutes, (3) a sleep quality rating (1-4 scale).
The fourth item, overall, how sleepy he/she was during the day (1-7 scale), was used to capture
daytime sleepiness. Participants could write “CR” (can’t remember) next to any items that could
not be recalled.
Compliance criteria for ED reports and IVR recall reports
ED reports and IVR recall reports were examined for compliance with the protocol
assessment schedule. An insufficient number of ED reports during a reporting period could
result in inaccurate estimates of the reporting period, and comparisons with IVR recall reports
could be biased if IVR reports were not completed on the last day of the reporting period.
An IVR recall rating was considered compliant if it was completed on the evening that
the participant was contacted to make a rating. If an IVR report was missed, the participant was
contacted the following evening; completing the report on that evening was considered
compliant for the 28-day report, as well as for the 3-day and 7-day reports if it took place for the
first time during the study protocol.
Compliance criteria for the ED ratings were: a morning-report had to be completed
between 5 AM and 2 PM; an evening-report had to be completed between 6 PM and 3 AM for
any given day. We required that a patient complete all 3 ED assessments for the 3-day period,
at least 6 for the 7-day period, and at least 21 for the 28-day period. These criteria were
assessed separately for morning- and evening-reports; that is, patients could meet the criteria
for neither report, only either for the morning- or evening report, or for both reports.
To be included in the analyses, we required that a patient met the IVR and ED
compliance criteria for at least one out of the two 3-day and 7-day reporting periods, as well as
for the 28-day period. If a participant met the criteria for both of the 3-day or both of the 7-day
periods, the data from the period that occurred first in the study protocol was analyzed.
Aggregation of ED data across days of a reporting period
For each participant, ED responses were averaged across the days of a reporting
period. ED responses for items addressing the occurrence of discrete events (e.g., whether
sleep medication was taken that day) were averaged across days and multiplied by 100, to
represent the percentage of days an event was endorsed. IVR recall responses for these items
(e.g., on how many days sleep medication was taken) were transformed correspondingly to
represent the percentage of days a problem was indicated.
Analysis of level differences between ED and IVR recall
Repeated measures analysis of variance was used to examine level differences
between ED and IVR recall reports. To address the hypothesis that symptom severity and
frequency would be higher in IVR recall than in aggregated ED reports, we examined the main
effect of reporting method (ED/IVR, within-subjects) across all three reporting periods (3-, 7-,
28-day, within subjects). To address the hypothesis that the difference between IVR recall and
aggregated ED reports would be greater for longer than for shorter reporting periods, we
examined the reporting method (ED/recall) by reporting period (3-, 7-, 28-day) interaction term
in 2x3 repeated-measures analysis of variance.
Analysis of correspondence between ED and IVR recall
Correspondence between ED reports and IVR recall reports was examined with
correlation analyses. To address the hypothesis that the correlation between IVR recall and
aggregated ED reports would differ across the 3-, 7-, and 28-day reporting periods, we had to
take into account that these correlations were non-independent (i.e., they came from the same
sample). Thus, we estimated the 3 correlations simultaneously as part of the full correlation
matrix of all reports (a 6x6 correlation matrix of 3-, 7-, 28-day ED, and 3-, 7-, 28-day IVR). A
Wald Chi-square test was used to test the null-hypothesis of no differences across reporting
periods. Analyses were conducted separately for each item.
The compliance criteria minimized missing data, but allowed for missing ED reports on
some days (see compliance criteria); multiple imputation was used to account for the missing
data. For each missing ED data point, we randomly selected a value from all available ED
ratings for that person and item, generating a set of five imputed databases. Multiple
imputations allow for variation across the five data sets to reflect the uncertainty of imputed
data, and research indicates that five is a sufficient number of datasets when the rate of missing
responses is low (Schafer, 1997). Analyses were performed using Mplus (Version 4).
Compliance and analysis sample
Figure 1 shows the flow of patients into the study. Telephone screening of 195 patients
found 22 (11%) were ineligible. Of the 173 eligible patients, 48 (28%) declined participation. Six
patients dropped out, and 119 completed the study.
Overall, compliance criteria were met for 89% of IVR recall reports (87% for 3-day, 87%
for 7-day, and 95% for 28 day recall, respectively), for 86% of ED morning reports, and for 88%
of ED evening reports. Data were excluded due to IVR or ED noncompliance or ED malfunction.
The analysis samples were n = 83 for morning report items, and n = 87 for evening report items
(with a combined sample size of n = 94). In the analysis samples, there were no missing IVR
ratings. There were no missing days for the 3-day reporting period. For the 7-day reporting
period, 15 out of 581 (2.6%) days were missing for morning reports, and 20 out of 609 (3.3%)
for evening reports. For the 28-day reporting period, 128 out of 2,324 (5.5%) days were missing
for morning reports, and 121 out of 2,436 (5.0%) days were missing for evening reports.
Participants had a mean age of 57 years (range 21 to 83, SD = 13.9) and tended to be
female (68%), married (61%), and White (87%) (see Table 1). As would be expected in a
Primary Care Clinic, the age distribution was skewed with the majority (68%) of the participants
> 50 years of age. Self-reported general health was described as “fair” or “poor” by 8% of the
sample. Half (51%) of the patients met the cutoff for "poor sleepers" (score >5) on the PSQI
(Buysse, et al., 1989). Twenty-five percent of the patients had ESS scores >10, the most
common cut-off for excessive daytime sleepiness (Sanford et al., 2006).
There were no statistically significant differences between the analyzed and excluded
participants on the demographic variables. Analyzed and excluded participants also did not
differ on baseline ESS scores (p = .60); however, baseline PSQI indicated greater sleep
problems in excluded than in analyzed participants (mean PSQI scores of 9.8 versus 6.8,
respectively; p = .006).
Comparison of rating level differences between aggregated daily and recall ratings across
Our first hypothesis was that on average patients would over-report nighttime sleep
problems and daytime sleepiness in recall reports relative to aggregated ED reports. This
hypothesis was not supported (see Tables 2 and 3). For only 1 (4%) out of 25 items was the
mean IVR recall response significantly (p < .05) higher than the mean ED response: patients
reported "dozing off while watching TV" on 23.0% (IVR) versus 19.2% (ED) of the days. In
contrast to the hypothesis, patients reported fewer sleep problems on recall on several items.
Specifically, "problems with keeping up enthusiasm to get things done" was noted significantly
less often on recall, as were sleeping difficulties due to breathing problems and due to
coughing/snoring. However, these differences were small in magnitude, in that on all items the
self-reported occurrence differed by less than 4% of the days between IVR recall and ED
methods. In addition, patients reported going to bed 24 minutes earlier (p < .001) and getting up
11 minutes earlier (p < .01) on the recall ratings compared with the daily ratings.
The hypothesis that the degree of over-reporting of symptoms would be greater for
longer than for shorter reporting periods was also not supported. Statistically significant (p < .05)
interactions between reporting method (ED / IVR) and reporting period (3-, 7-, 28 day) were
found for 8 (32%) of the items (see Tables 2 and 3). The direction of these interactions
suggested that the IVR recall ratings indicated more sleep problems over the shorter (3 day)
reporting period than ED reports. Whereas, recall ratings for the longer reporting period (28-day)
indicated fewer sleep problems than ED reports. However, on average across all items, the
percentage of days for which problems were reported was 1.3% greater, 0.2% greater, and
1.1% smaller in IVR recall than ED reports for the 3-, 7- and 28-day reporting periods,
respectively. This suggests that the overall evidence for an effect of the length of recall on the
degree of under- or over-reporting of symptoms is very weak.
Correspondence between ED and recall ratings across reporting periods
Correlations between aggregated ED reports and IVR recall reports are shown in Tables
4 (for morning reports) and 5 (for evening reports). For time ratings (time gone to bed and time
woke up) and durations (sleep latency and duration of sleep), the correlations between recall
and aggregated ED ratings were high for all reporting periods, with an average correlation of r =
.85 (range .69 to .95). Similarly, moderate to high correspondence between recall and
aggregated ED reports was evident for items reporting frequencies of sleeping problems with an
average correlation of r = .84 (range .49 to .97). However, correlations were generally lower for
ESS items pertaining to ratings of the number of days of daytime sleepiness during various
activities with an average correlation of r = .62 (range .06 to .95). This latter finding may in part
be explained by the low prevalence of daytime sleepiness problems reported for several items
(below 5% of the days for some items, see Table 3).
The hypothesis that longer reporting periods would result in lower correspondence was
not supported. Even though the magnitude of the correlations between ED and IVR recall
reports differed significantly (p < .05) between reporting periods for 11 (41%) out of the 25
items, the highest correlations were evident inconsistently for the 3-, 7-, and 28-day period (see
Tables 4 and 5). The average correlation between ED and IVR recall reports was r = .74 for the
3-day, r = .75 for the 7-day, and r = .77 for the 28-day reporting period, respectively.
Recall ratings of each of the last 7 days
At the last laboratory visit, patients made recall ratings on 4 items for each of the last 7
days. They were given the option of responding that they could not remember. Ten percent of
patients reported inability to remember 3 days previously for the 3 sleep quality items, and 10-
20% could not remember these experiences by the 5th through 7th previous days. Patients found
it even more difficult to remember bedtime: 24% could not remember 3 days ago, and 44%
could not remember 7 days ago.
We examined level differences and correlations between recall ratings for each of the 7
days and the ED ratings obtained on corresponding days (during the patients’ last week in the
study). The analyses were conducted on the 94 patients that were included in the analysis
sample for the main hypotheses. Across 7 days and 4 items, only 2 of 28 level differences
between recall ratings and ED ratings were significant (p ≤.05). Across the 4 items, the average
correlation for “yesterday” was .80. Starting with two days ago through 7 days ago, the
correlations range from .51 to .66. Thus, the between-subject correlations between recall and
ED ratings decline quickly, starting with two days ago. Indeed, 71% of the correlations were less
than .70, and 25% were ≤ .50 indicating sub-optimal correspondence.
An even more relevant analysis to determine recall-rating accuracy is to examine the
within-subject correlations between recall and ED ratings across the 7 days for each item.
Whereas the between-subject analyses examined differences between people for a given day,
within-subject analyses examine the variation of responses from day to day within a person.
Thus, the within-subject correlations more directly address the question of whether patients can
recall and differentiate a given day from other days in the past week. The average within-subject
correlation pooled across patients for the bedtime item was .58, while the correlations for the 3
qualitative items relating to sleep ranged from .20 to .26, indicating poor correspondence.
This study was designed to examine the ecological validity of Pittsburgh Sleep Quality
Index and Epworth Sleepiness Scale item responses across 3 reporting periods: 3-day, 7-day,
and 28-day. Specifically, the data were analyzed for comparability of reports of items such as
sleep quality, latency, duration, and instances of daytime sleepiness when aggregated daily
reports were compared with recall reports. Good comparability would indicate acceptable
ecological validity. Research on other patient reported outcomes, such as pain, fatigue and
physical functioning, suggested that, as the reporting period gets longer, patients report higher
levels of symptoms, and the between-subject correlations of daily and recall ratings can vary by
length of reporting period (Broderick, et al., 2010; Broderick, et al., 2008). We hypothesized that
the ecological validity of sleep-related items might show some of the problems observed in other
populations and item domains, given that the specific nature of problems assessed in many of
the PSQI and ESS items might be difficult to remember across a month reporting period.
Our results did not support the first and second hypotheses that patients would over
report sleep difficulties on recall and that the length of the recall period would systematically
impact patients’ accuracy of PSQI and ESS item responses. Patients’ accuracy on recall was
comparable, regardless of the length of the reporting period from 3 days through a month. Some
rating differences were observed when specific items were compared on recall versus daily
reports. There were statistically significant differences, but they are probably not meaningful.
Generally, across all of the items, the levels on the recall ratings were very close to those
generated by the aggregated daily ratings. This suggests that patients are able to accurately
report both sleep and daytime sleepiness experiences for recall periods of up to a month.
Likewise, between-subject correlations of aggregated daily and recall reports demonstrated
good correspondence for all items except those with very low frequencies observed, particularly
the daytime sleepiness items. Thus, the third hypothesis was generally not supported.
A final set of analyses testing the accuracy of day-by-day recall for the last 7 days of the
protocol (data collected on the last laboratory visit) provides some insight into the nature of
recall of PROs. As we observed in a previous study of pain ratings where 19-34% of patients
could not remember their pain 5 to 7 days ago (Broderick, et al., 2008), 10-40% of this study’s
patients acknowledged that they had difficulty remembering their sleep experiences beyond
more than several days. When we examined the within-subject correlations of the 7 day-by-day
ratings made at the end of the study with the corresponding daily ratings, we found exceedingly
low correlations (.20’s). This means that patients cannot accurately identify days from the past
week that had relatively high and low levels of symptoms. Nevertheless, the between-subject
level and correlation analyses on the 7 day-by-day ratings show good correspondence,
suggesting that patients are able to generate ratings that accurately reflect their level relative to
other patients. We interpret these results as evidence that patients “know” their typical/average
sleep and daytime sleepiness experiences, and they can report them accurately for between-
subject comparisons for short and long reporting periods. However, our data suggest that this
accuracy is not due to patients being able to always remember their sleep-related experiences
on the specific days included in the reporting period. For example, if patients “know” when they
generally go to bed, when they wake up, and how often they fall asleep watching television –
this would be enough to generate high between-subject correlations. However, they cannot
remember whether last Monday was better or worse than last Tuesday. For these reasons, we
conclude that sleep-related PROs, with items like the PSQI and ESS, can be administered with
confidence for week or month-long reporting periods for between-subject analyses. In the case
of studies requiring a finer grain within-subject analysis across days, daily diary measurement is
There are strengths and limitations in this study. One study limitation is that the
response options that patients were given on the ED and IVR recall questions in this study were
modified from the original PSQI and ESS scales to allow for direct comparison between the
daily and recall assessment methods. The response options on the PSQI and ESS standard
instruments are often ranges, whereas the options for the IVR recall items required that the
patient report a specific number of days for a sleep problem. Thus, the recall task in this study
could be more challenging than completion of the standard instruments. Therefore, this study
does not directly address the accuracy of patient reports using those response options.
However, the fact that more specific responses were required in this study lends strength to the
conclusions. Second, this study was observational and did not collect data during an
intervention. The accuracy data reported in this study may not generalize to recall ratings during
a reporting period with change, as in a clinical trial. Third, it is possible that because patients
were focusing on their sleep experiences each day, their ability to recall them might have been
improved relative to recall ratings made in the absence of daily reports. However, we tested this
possibility in a previous study of pain and found no evidence (Stone et al., 2003). Fourth, we