Content uploaded by Michael Hufford
Author content
All content in this area was uploaded by Michael Hufford on Feb 11, 2016
Content may be subject to copyright.
Journal of Consulting and Clinical Psychology Copyright 1997 by the American Psychological Association, Inc.
1997, Vol. 65, No. 2, 292-300 0022-006X/97/$3.00
Remember That? A Comparison of Real-Time Versus
Retrospective Recall of Smoking Lapses
Saul Shiffman, Michael Hufford, Mary Hickcox, Jean A. Paty, Maryann Gnys, and Jon D. Kassel
University of Pittsburgh
Research and treatment assessments often rely on retrospective recall of events. The accuracy of
recall was tested using accounts of smoking lapse episodes from 127 participants who had quit
smoking, and lapses and temptations were recorded in near-real time using a hand-held computer.
These computer records were compared with retrospective accounts elicited 12 weeks later, with a
focus on recall of lapses in 4 content domains: mood, activity, episode Triggers, and abstinence
violation effects. Recall of lapses was quite poor: Average kappas for items ranged from 0.18 to
0.27. Mean profile rs assessing recall for the overall pattern of behavior were .36, .30, .33, and .44
for these domains, respectively. In recall, participants overestimated their negative affect and the
number of cigarettes they had smoked during the lapse, and their recall was influenced by current
smoking status. The findings suggest caution in the use of recall in research and intervention.
Clinicians and behavioral scientists rely on participants' ret-
rospective reports for their information. Many inquiries request
summaries of respondents' behavior to yield estimates of event
frequencies or accounts of typical behavior. Accurate retrieval
of such "generic personal memories" (Brewer, 1994) requires
that respondents not only recall relevant data but also summarize
it. Other inquiries focus on specific, episodic personal memory,
or recall of a particular episode (Brewer, 1994), with no require-
ment for summary. Examples abound: research on the details
of medical encountets, traumatic experience, pain, and recall of
addiction relapse episodes, which is our focus here. Clinical
practitioners rely even more heavily than researchers on recall
of events.
Research on autobiographical memory suggests, however, that
recall of events is highly prone to error and bias. This literature
suggests that recollection is not simply the direct retrieval of
information from a decaying archive. Instead, memory seems
to rely on heuristic strategies to reconstruct recalled events (see
Bradburn, Rips, & Shevell, 1987). Recall for a particular epi-
sode can be disrupted by interference from similar events that
have occurred either before or after the episode (Bradburn et
al., 1987). Recall of particular events is often guided--and
Saul Shiffman, Michael Hufford, Mary Hickcox, Jean A. Paty, Mary-
ann Gnys, and Jon D. Kassel, Department of Psychology, University of
Pittsburgh.
This research was supported by Grant DA06084 from the National
Institute on Drug Abuse.
We are grateful to Michael Sayette and Jonathan Schooler for their
helpful comments on earlier drafts of this article. We are also grateful
to Celeste Elash, Walter Perz, and Thomas Richards for their data man-
agement and analysis assistance and to Stephanie Paton and Yolanda
DiBucci for their administrative assistance.
Correspondence concerning this article should be addressed to Saul
Shiffman, University of Pittsburgh, Department of Psychology, Bellefield
Professional Building, 130 North Bellefield Avenue, Suite 510, Pitts-
burgh, Pennsylvania 15260.
292
distorted--by schemata that describe a prototypal class of
events (e.g., going to the doctor or eating in a restaurant) and
is also distorted by respondents' preconceived notions or mental
models about the event or by their attempts to salvage their self-
esteem or create a seemingly coherent and consistent narrative
(Ross, 1989). Recall, thus, can be modified and biased by events
that occur after the index event; for example, couples who have
since developed marital difficulties experience distorted recall
such that episodes early in their relationship seem more negative
than they were (Holmberg & Holmes, 1994). Memory can also
be influenced heavily by participants' current state, so that mem-
ories are more negatively tinged and negative events are more
easily recalled when participants are emotionally distraught
(Teasdale & Fogerty, 1979).
The accuracy (or inaccuracy) of recall is not uniform. For
example, accuracy is generally worse over longer recall intervals
(Brown, Rips, & Shevell, 1985) and better when events are
salient--unique, important, or emotionally charged (Eisen-
hower, Mathiowetz, & Morganstein, 1991). Recall content is
also relevant: Recall of dates is quite poor (see Friedman, 1993 ),
and objective facts are better recalled than subjective states
(Brewer, 1988).
Recall processes may introduce not only inaccuracy (i.e.,
random error) but also bias (i.e., systematic error) into the
recalled material. For example, depressed patients systemati-
cally overestimate the occurrence of negative life events
(Clark & Teasdale, 1982) because of their current salience and
coherence with life view, thus exaggerating the apparent associa-
tion between negative life events and depression. Accordingly,
it is imperative to evaluate the accuracy of retrospective recall
as used in research and clinical contexts. There are scant data
evaluating recall of particular events, perhaps because of the
difficulty of obtaining a contemporaneous record of the original
event.
One area that has relied heavily on recall of specific events
is the study and treatment of addiction relapse, where assessment
focusses on the details of relapse episodes (Brownell, Marlatt,
REMEMBER THAT? 293
Lichtenstein, & Wilson, 1986). Several studies, for example,
have examined the circumstances of initial relapse episodes
(lapses) among cigarette smokers (see Sutton, 1992). This pro-
vides a useful context for evaluating the validity of recall for
such episodic personal memortles, as smoking lapses may repre-
sent personally salient events and smokers typically claim clear
recall of lapse episodes. In this article, we use recall of first
lapses to evaluate the validity of retrospective reports of specific
life events.
Studies of smoking lapses have yielded a consistent picture
of lapses, emphasizing the role of negative affect, smoking cues,
alcohol consumption, and coping responses. Participants' emo-
tionai reactions to lapses are also considered prognostically im-
portant (Marlatt & Gordon, 1985). Besides lapses, investigators
have been interested in the characteristics of "near miss" situa-
tions, temptations that do not result in a lapse (e.g. Shiffman,
Paty, Gnys, et al., 1996). Both kinds of accounts are used heav-
ily.in clinical assessment. Unfortunately, information regarding
smoking lapses comes largely from retrospective recall of lapse
episodes after long periods. Given the vagaries of autobiographi-
cal memory, the accuracy and lack of bias of these reports is
open to question. Indeed, studies have cast doubt on their valid-
ity. McKay, O'Farrell, Maisto, and Connors (1989) have shown
that attributions for lapses change over time, and Hodgins, el-
Guebaly, and Armstrong (1995) recently showed that recall of
affective state during an alcohol lapse is influenced by mood at
the time of recall.
We assessed accuracy and bias in recall of first lapse episodes
by comparing retrospective accounts of these episodes, collected
12 weeks after the fact, with data collected soon after their
occurrence by participants using palm-top computers to record
their experiences; 89% of participants reported recording their
data within 15 min of the episode (Shiffman, Paty, Gnys, et al.,
1996). We also assessed recall of participants' most intense
temptation episode; such assessments have often been used as
a comparison for lapse episodes. Accuracy of recall was as-
sessed using measures of association that tapped correspondence
between the two data sources while being insensitive to mean
differences between them. Because participants might fail to
recall individual facts about the episode while still retaining the
overall picture, we also used profile correlations, which assess
the overall match in the pattern or profile of data (Cronbach &
Gleser, 1953), to compare recorded and recalled data. We used
these coefficients to examine moderators of recall accuracy; for
example, to test whether smoking status at the time of recall
might affect recall, perhaps because of retroactive interference.
Finally, we evaluated bias in recall by assessing systematic mean
differences between recorded and recalled data, focusing on
reports of amount smoked and abstinence violation effects (a
constellation of negative reactions to a lapse, see Marlatt &
Gordon, 1985), which seemed most subject to retrospective
distortion.
Method
program. (Selection of participants is described later.) Participants were
recruited through advertisements for smoking cessation treatment and
were paid $50. To qualify, participants had to smoke at least 10 cigarettes
per day for at least 2 years. For other screening criteria, see Shiffman,
Paty, Gnys, et ai. (1996). Ninety-five percent of the participants were
Caucasian, and 58% were women. Seventy-three percent of the partici-
pants had completed at least some college education; 36% had a college
degree. Participants averaged 41.5 (SD = 10.2) years of age and had
been smoking for an average of 22.66
(SD
= 9.9) years. At enrollment,
participants reported smoking an average of 27.93
(SD
= 13.5) ciga-
rettes per day. (See Shiffman, Paty, Gnys, et al., 1996).
Procedure
Monitoring.
Participants were trained in the use of the palm-top
computer, or Electronic Diary (ED) and were asked to monitor ad lib
smoking for 2 weeks leading up to a target quit date (TQD), at which
point they began monitoring temptations. Monitoring of abstinence expe-
rience continued for up to 25 days after TQD. Participants were consid-
ered quit after abstaining 24 hr; they were then eligible to record lapses,
which they were asked to do as soon as possible after any smoking
episode. Each evening, participants were additionally asked to report
any lapse or temptation episodes they had failed to record in real time.
Only participants who explicitly denied failing to enter their first lapse
in real time are included in analyses of lapse episodes. 1 Participants
were not informed that they would later be asked to recall the details
of temptation and lapse episodes. Throughout the monitoring period,
ED also prompted participants at random times for assessments approxi-
mately 4 to 5 times per day; participants responded within 2 min to
90% of these prompts. For potential verification of smoking status, breath
and saliva samples (for carbon monoxide and cotinine assay, respec-
tively) were collected at each clinic visit (roughly weekly); we did not
inform participants Of limits in the sensitivity of these assays, to create
a bogus pipeline effect. Participants' monitoring records were also re-
viewed for compliance at every visit, and they were given compliance
feedback to maintain performance. Participants were provided with cog-
nitive-behaviorai treatment during the entire monitoring period (see
Shiffman, Paty, Gnys, et al., 1996, for details).
Lapses.
After quitting, participants were instructed to initiate a com-
puter entry if they lapsed (immediately after the lapse episode had
ended). A lapse comprised any occasion of smoking, even if only a
puff. A single lapse episode could encompass more than one cigarette
if several were smoked at one sitting. On average, the first lapse occurred
5.37
(SD
= 5.68) days after quit and 72.44
(SD
= 13.67) days before
recall. Most lapses (72%) comprised smoking one cigarette or less (e.g.,
several puffs); only 5% of lapses involved more than two cigarettes
(Shiffman, Paty, Gnys, et al., 1996).
Temptations.
Participants were asked to initiate a recording each
time they experienced a temptation, defined as an acute rise in urge to
smoke or an occasion in which participants felt they had come to the
brink of smoking, regardless of subjective urge. On average, the tempta-
tion at highest urge occurred 4.01
(SD
= 5.67) days after quit and 71.17
(SD
= 15.64) days before recall. Those who lapsed had entered an
average of 23.23
(SD
= 24.94) temptations before lapsing.
Follow-up and recall.
Participants who completed the computerized
monitoring phase of the study were invited to participate in a follow-
up meeting, which was scheduled an average of 82.04
(SD
= 14.85
days) after the TQD. Smoking status was reported and biochemically
verified at the follow-up. Participants completed a calendar reporting all
Participants
Participants were 127 smokers, a subset of 215 smokers who quit
smoking for at least 24 hr while enrolled in a smoking cessation research
Participants who failed to record lapse episodes in real time scored
significantly higher in social desirability (Shiffman et al., in press);
thus, participants in the present analyses scored lower on social desirabil-
ity and seemed willing to record their "failures."
294 SHIFFMAN ET AL.
smoking during the follow-up period. Before the follow-up, participants
were sent questionnaires soliciting recall of lapses and temptations. Parti-
cipants who had lapsed during monitoring were asked to recall'the details
of their first lapse episode; all participants were asked to recall their most
intense temptation episode during the monitoring period. Participants
described each episode using a questionnaire modeled closely after the
episode inquiry that was programmed into ED: The written questions
used the same wording, sequence, and response scales as the ED. Partici-
pants were asked to recall the date and time of the episode. If the date
did not approximately match the date of the lapse recorded on ED,
participants were later provided the date and time of the first lapse as
recorded in ED data, and were asked to focus their recall on that episode.
(This was meant to address the concern that recall might seem inaccurate
if participants recalled [ accurately] a different lapse episode). Subse-
quent analysis showed that the cued recall was generally less accurate
than free recall; accordingly, we focus on data from free recall of these
episodes.
Measures
ED assessments.
After each episode reported to ED, participants
completed an assessment of the situation. 2 Items used 4-point scales
(typically "NO!!, no??, yes??, YES!!"; Meddis, 1972) unless otherwise
indicated. (See Shiffman, Paty, Gnys, et al., 1996, for more details on
the assessment.) The assessment covered five domains: mood, activities,
triggers, abstinence violation effects (AVE) assessment, and coping. For
the mood domain, participants rated 14 mood items derived from the
circumplex model of affect (Larsen & Diener, 1992), as well as from
the fourth edition of the
Diagnostic and Statistical Manual of Mental
Disorders (DSM-IE,"
American Psychiatric Association, ~1994) criteria
for tobacco withdrawal: restless, tired, energetic, spacey, having diffi-
culty concentrating, happy, irritable, miserable, tense, contented, sad,
and frustrated or angry, on a scale ranging from 1 to 4. Participants also
rated their overall energy level and overall affect On a scale ranging from
1 to 5. For the activities domain, participants used yes-no responses to
report their activities by endorsing one or more activities from a list (on
the telephone, leisure, working, inactivity, interacting with others, alone,
eating food, coffee, alcohol, other activity). For the triggers domain,
participants identified the factor (or factors) that triggered the temptation
or lapse by endorsing one or more triggers from a list of nine (bad
mood, stress, good mood, smoking cues, eating or drinking, relaxing,
boredom, in transition, and other triggers) and by indicating the single
most important trigger. For AVE assessment, participants were asked to
characterize their reactions to the lapse or temptation, reporting whether
they felt encouraged, their confidence to continue abstaining, whether
they felt guilty, whether the episode was their fault, and whether they
felt like giving up their efforts to abstain. They also rated their attribu-
tions for the cause of the episode on three dimensions (Marlatt & Gor-
don, 1985 ): internality (outside me-inside me ), controllability (con-
trollable-uncontrollable), and stability (changing-unchanging). For
the coping domain, participants reported any attempts they made to cope
with the temptation to smoke (immediate coping: Wills & Shiffman,
1985 ), recording whether they had performed behavioral coping, cogni-
tive coping, both, or neither. Participants also reported how much they
smoked during a lapse. 3
Reliability and validity of diary data.
The reliability and validity of
ecological momentary assessment data (EMA; Stone & Shiffman, 1994)
have been established in many settings (e.g., Csikszentmihalyi & Larson,
1987; McFall, 1977) both for discriminating between persons and be-
tween situations. ED data from the present study demonstrated reliability
and validity (detailed analyses available from Saul Shiffman).
Smoking status variables.
We used data from a smoking history
form completed before the study and the smoking calendar at follow-
up to assess participants' smoking history. From this, we derived infor-
mation such as the number of previous quit attempts (and, thus, failures ),
the amount and rate of smoking since the lapse, and the number of times
the participant moved between abstinence and smoking in the current
quit attempt. This transition between smoking and abstinence was con-
sidered to have occurred when a participant smoked after having been
abstinent for ~-24 hr.
ED System Hardware and Software
The ED system was implemented on a PSION Organizer II LZ 64
(5.6 in. × 3.1 in. × 1.1 in. [14.22 cm × 7.87 cm × 2.79 cm]; 8.8 oz.
[ 249.48 g] ; PSION, Ltd., London, England) hand-held computer, using
software developed specifically for this project. A four-line, 20-character
LCD screen presented questions, and a simple user interface recorded
responses. For each question displayed on screen, participants scrolled
through alternative responses, pressing an "enter" key to select the
preferred response. Each entry was recorded along with the day and
time, which precluded mass recording of delayed entries. The system
is described in more detail elsewhere (Shiffman, Paty, Gnys, et al., 1996;
Shiffman, Paty, & Kassel, 1996).
Data Reduction and Analysis
Of 215 participants who quit smoking, 127 were included in the
present analyses of lapses or temptations; 32 were included in both. For
lapses, we used data from 87 participants who recorded their first lapse
in real time on the ED and also completed a lapse recall form at follow-
up. For recall of temptations, we focused on 72 participants who had
completed a temptation follow-up form and who had recorded a uniquely
intense temptation episode on ED (i.e., the participant recorded only
one episode at this maximal urge level). These episodes were taken to
be referents for later recall and formed the basis of our detailed analyses.
These analyses exclude data from 73 participants who had recorded
multiple temptation episodes at the peak urge rating, making the referent
for recall ambiguous. However, we did analyze multiple-episode data
from 42 of these participants who had recorded two to four episodes
(M = 2.81;
SD
= 0.83) at the peak urge rating. Recall accuracy for
these episodes, averaged across episodes, resembled that observed for
unique episodes (reported later). We did not analyze data from 31
participants who reported five or more temptations (M = 14.42,
SD =
11.29) at the same maximal urge intensity.
Lapse and temptation episodes were analyzed separately. Recall data
were compared with ED data in several ways. To assess recall accuracy,
we computed correlations and Cohen's kappa for the two data sources
for each variable. (For dichotomous variables, the two statistics are
equivalent. For other scores, kappa is a much more stringent test of
agreement, as it requires exact agreement and does not credit near-misses
or systematic, consistent errors.) To summarize the accuracy of recall,
we also computed profile rs--computed for each participant across two
arrays of variables, from recall and ED data--for four domains of the
assessment. The coefficients reflect the degree to which the pattern or
profile of responses is similar across the two assessments, without being
sensitive to differences in numeric agreement or to mean scores (Cron-
bach & Gleser, 1953). Thus, for each domain, each participant was
assigned a coefficient that summarized the similarity between profiles
2 Some participants were in a reduced burden control group that re-
ceived fewer and shorter assessments. Thus, some responses are not
available for these participants.
3 ED reports of urge and craving in lapse and temptation episodes are
not analyzed because the format of El) and follow-up inquiries differed
substantially.
REMEMBER THAT? 295
one-sample t tests and also correlated the difference with variables (e.g.,
smoking status at recall) that were hypothesized to affect recall bias.
Results
Reports of Lapse and Temptation Episodes
Participants reported that they had recorded lapses an average
of 9.44 min
(Mdn
= 1.0,
SD
= 19.87) after their termination.
Temptations were said to have been recorded after a lag of 8.42
min (M = 2.50,
SD
= 19.20). The reported lag to recording
was not influenced by participants' age or gender, nor was it
correlated with social desirability or related to any measure of
recall accuracy.
Episode Dating
Few participants could accurately remember the day on which
their lapse or temptation episode occurred. On average, partici-
pants missed the day of the lapse episode by 14.5 days
(SD =
19.8) and the temptation episode by 28.4 days
(SD
= 25.2),
using the absolute number of days from the actual date, whether
earlier or later. For lapses, 23% of the participants recalled the
correct day, and 57% recalled the correct day ---1 week. For
temptations, 6% of participants recalled the correct day, and
29% were accurate + 1 week. Some estimates were wildly inac-
curate: Five percent of participants reported their first lapse as
occurring on a day before their actual quit day; 25% estimated
their first lapse to have occurred after the 4-week monitoring
period had ended; 57% of temptations were attributed to a date
that fell after the monitoring period. Dating errors were asym-
metric: Participants tended to "telescope" recall to bring the
date closer to the date of follow-up. The signed recall error
(expected to be 0 if dating errors were symmetric) was 13.6
days later than the recorded event
(SD
= 20.40; one sample t
test vs. 0); t(86) = 6.10, p < .001, for lapses and 25.55 days
later than the recorded event
(SD
= 28.09), t(71) = 7.61, p <
.001, for temptations.
Figure 1.
Mood profile rs from three participants representing low,
medium, and high accuracy. ED = Electronic Diary; Hard Conc. = hard
time concentrating.
for the two data sources. Figure 1 demonstrates how the profile rs
capture the overall correspondence of two data arrays.
The profile rs also provided scores that summarized recall accuracy
for each participant. These were, in turn, correlated with potential moder-
ators of accuracy such as length of recall, confidence in recall, smoking
status at follow-up, and so forth. To test for systematic bias in recall,
we computed the differences between recall and ED data. We computed
Accuracy of Episode Recall
Lapses.
Recall accuracy for lapses was poor: Item correla-
tions between recalled and ED data were consistently modest,
averaging .32 across all data items. The mean item correlations
for the domains of mood, activities, triggers, and AVE assess-
ment were .36, .24, .28, and .34, respectively (Table 1 ). Thus,
even for mood, which demonstrates the best results, recall cap-
tured only 13% of the variance in original data. Cohen's kappa
yields slightly worse, results, particularly for the mood data,
where kappa averaged 0.18 (not shown).
The poor recall of lapse details extended to the variables of
most interest to students of relapse process. For example, recall
of coping is poor, with only 45% agreement (K = 0.24). Recall
of the single most important trigger corresponded to ED entries
less than one third of the time (32%, K = 0.19). Recalled
mood showed only modest correspondence with real-time data.
Alcohol consumption was the most accurately recalled variable
we tested (K = 0.63); 83% of participants correctly recalled
drinking, and only 9% of those who recorded no drinking later
recalled drinking.
296 SHIFFMANETAL.
Table 1
Summary of Correlations Between ED Records and Recall at Follow-Up
Lapses Temptations
Category-question r Mean difference r Mean difference
Mood adjectives (1-4)
Average (and
SD)
.36 (.10) -0.35 (0.23) .27 (.11) -0.42 (0.34)
Range .19-.49 -0.67-0.07 .07 -.40 -0.90-0.29
% negative 0.00 83.33 0.00 83.33
Activities (no-yes)
Average (and
SD)
.24 (. 16) -0.07 (0.07) .06 (.20) -0.09 (0.10)
Range .09-.64 -0.15-0.08 -.23-.36 -0.18-0.08
% negative 0.00 90.00 40.00 70.00
Triggers (no-yes)
Average (and
SD)
.28 (.14) -0.07 (0.07) .14 (.14) -0.07 (0.12)
Range 0.7 -0.54 -0.19 -0.01 -.05 -.28 -0.25 -0.07
% negative 0.00 88.89 11.11 66.67
AVE (1-4)
Average (and
SD)
.34 (.15) -0.04 (0.35) .13 (.20) -0.22 (0.33)
Range .04-.51 -0.72-0.44 -.07 -.56 -0.67 -0.30
% negative 0.00 50.00 25.00 75.00
Coping (no-yes)
Average (and
SD)
.21 (.05) 0.04 (0.06) -.01 (.26) 0.12 (0.00)
Range .17-.24 0.00-0.08 -.19-.17 0.12-0.12
% negative 0.00 0.00 50.00 0.00
Note.
Ns range from 69 to 87. Statistics reflect summaries across items within each domain. Mean
differences reflect differences between reports given at follow-up and reports given on the Electronic Diary
(ED). Negative values indicate that follow-up > ED report. A table showing detailed data for individual
items is available on request from Saul Shiffman. AVE = abstinence violation effects.
We also assessed the match between recall and real-time re-
cording through profile rs, which do not require agreement or
correlation on any given item within a domain but rather assess
the degree to which the pattern of responses is matched. The
profile rs averaged .36, .30, .33, and .44 for the four domains
of mood, activities, triggers, and AVE, respectively (Table 2).
The magnitude of the correlations is modest, echoing the results
of the individual item correlations. The average rs were not
significantly different from one .another (by dependent t test):
The four domains were recalled with equivalent accuracy.
Temptations.
Accuracy of recall for the most intense tempta-
tion seemed generally worse than recall of the first lapse. More
of the correlation coefficients were negative (see Table 1 ) and
average item correlations for each domain were generally lower,
some near zero. Recall of the major triggers of temptations was
again poor, with 28% agreement (K = .12), as was recall of
coping (K = . 11 ). For temptations, profile rs averaged .28, .13,
.18, and .26 for mood, activities, triggers, and AVE domains,
respectively (see Table 2).4 Dependent t tests contrasting the
mean profile r across the four domains revealed that activities
were recalled more poorly than AVE or mood (ps < .05).
Comparisons of lapses and temptations.
Dependent t tests
were used to contrast the profile rs from temptations and lapses
(among the 32 participants who had data for both episodes).
Although profile rs for temptations were uniformly lower, only
recall of activities differed significantly between lapse and temp-
tation recall, t(31) = 2.1, p < .05.
Individual Differences in Recall Accuracy
Consistency in recall accuracy.
We tested whether individu-
als were consistent in their accuracy of recall across domains,
by correlating the profile rs for each domain with the profile
rs of the other three domains. Overall, the correlations were
low, averaging. 11 for lapses, and. 10 for temptations, and yield-
ing internal consistency coefficients (Cronbach's alpha) of .34
and .30, respectively. We also evaluated whether participants'
accuracy for each domain was correlated across lapses and
temptations. Only accuracy of mood recall in temptations was
correlated with the accuracy of mood recall in lapses (r = .58;
Table 2). Thus, participants' accuracy of recall tended to vary
both across temptations and lapses and across domains, preclud-
ing the use of a single summary score of recall accuracy. Accord-
ingly, further analyses of recall accuracy were conducted sepa-
rately for each domain. We focused our subsequent analyses
on recall of lapses, because these were more important, less
ambiguous events, and our sample was larger.
Moderators of accuracy.
We evaluated hypothesized moder-
ators of recall accuracy, as indexed by profile rs for lapse epi-
sodes. On average, participants indicated that they had relatively
high confidence in their ability to recall their prior episode (M
= 3.33,
SD
= 0.76, on a scale ranging from 1 to 4). Neither
the respondents' confidence in recall nor the length of the recall
interval (M = 72.44 days;
SD
= 13.67) was reliably related to
accuracy (rs < .23, ps > .27). There were no gender or age
differences in accuracy (r s < -. 19, p s > .08). More education
was associated with better.recall of mood (r = .26, p < .05),
but not other domains (rs < .14, ps > .25).
4 These statistics are based on participants who had a single uniquely
severe or intense temptation. Results were similar for average profile
correlations of multiple temptation episodes from participants who re-
ported 2-4 episodes at the peak urge intensity.
REMEMBER THAT?
Table 2
Profile Correlations: Means and Interrelationships
Lapse" Temptations"
Domain
M (SD)
95% C.I.
M (SD)
95% C.I. r b
Mood .36 (.39) .28-.45 .28 (.43) .18-.38 0.58*
Activities .30 (.43) .20-.39 .13 (.34) .05-.21 0.00
Triggers c .33 (.39) .23-.42 .18 (.42) .07-.29 -0.16
AVE c .44 (.34) .35-.52 .26 (.53) .12-.39 0.25
Note.
C.I. = confidence interval; AVE = abstinence violation effects.
a Ns range from 70 to 86 for lapses and from 60 to 72 for temptations. Some participants were in reduced-
burden control groups that were not asked some of the episode questions in certain domains, b Correlations
between profile rs, computed after transforming the profile rs to Fisher's zs. Ns for correlations range from
29 to 32. c Change in affect variable was not included in Profile r computations.
*p < .01.
297
We hypothesized that a history of prior lapse experiences
might impede recall accuracy because of confusion among mul-
tiple lapse episodes. Accordingly, we contrasted the recall of
participants with no previous quit history (n = 18) to those
with such histories (n = 69). Those participants who had never
before quit, and thus never relapsed, were more accurate in their
recall of their AVE reactions (never-quitters' mean AVE profile
r = .71, previous quitters, r = .37), t(66) = -3.4, p < .001.
Among participants with previous quit attempts, no relationship
emerged between the number of previous quit attempts and recall
accuracy (all rs < .15, ps > .20).
Lapse recall might also be weakened by retroactive interfer-
ence from later smoking occasions that might easily be confused
with the target episode. Thus, participants who had a second
lapse on the same day (n = 23) were expected to show poorer
recall (lower profile rs) ; this was not confirmed (ts < 1.44, ps
> .16). We also expected that participants who experienced
multiple transitions from abstinence to smoking would demon-
strate poorer recall. However, the number of times each person
reported abstinence of at least 24 hr followed by smoking was
unrelated to recall accuracy (rs < -. 13, p s > .22). We hypoth-
esized that participants who smoked more after that first lapse
might have worse recall because of more chances to confuse the
experience of the first lapse with subsequent smoking occasions.
Indeed, the more days of smoking the participant reported since
the lapse, the less accurate the participant was in recalling mood
( r = -.27, p < .05 ), although not activities, triggers, or AVE (rs
< .20, p s >. 10 ). Finally, participants who had been abstinent in
the 14 days before follow-up were better able to recall mood
(Profile rs: abstinent M = 0.62,
SD
= 0.26; nonabstinent M =
0.32,
SD
= 0.40), t(84) = -2.4,p < .05; and activities (absti-
nentM = 0.62,
SD
= 0.38; nonabstinent M = 0.24,
SD
= 0.41 ),
t(84) =-3.0, p < .01.
We also considered that lapses occurring in distinctive and
atypical affect states might be more memorable. To characterize
how typical or deviant a participant's affect was during the
lapse, we standardized the reported negative affect score relative
to the participant's baseline average and standard deviation (re-
moving the sign to equate unusually negative and unusually
positive affect). This index was unrelated to accuracy (rs <
.14, ps > .21 ). We also considered that lapses occurring after
a longer period of abstinence might be more salient and thus
more accurately recalled; this was not confirmed (rs < .12, ps
> .35).
Systematic Differences Between Recorded
and Recalled Data
The preceding analyses focused on the correspondence be-
tween patterns of recalled and recorded data but not on mean
differences, which reflect the operation of systematic biases in
recall. Results of mean comparisons between recall and records
of lapses suggests the influence of broad methods effects (Table
1). Participants endorsed higher values for most variables on
the recall questionnaire than on ED. This overall trend, which
probably reflects differences in response format between the
computer-recorded real time data and paper-and-pencil recall
data, makes interpretation of these differences difficult.
Nevertheless, the data suggest some systematic recall bias.
Examination of ED-recall differences for mood items reveals
a trend for participants' recall to exaggerate their negative mood
during the first lapse; only positive affect items were more
strongly endorsed on ED. To control for the methods effects
noted above, we computed an affect measure that balanced
positively worded (happy, contented) and negatively worded
(tense, frustrated-angry) items. These negative affect scores
were significantly higher at recall than ED scores (recall, M =
5.9,
SD
= 1.4; ED, M = 4.8,
SD
= 1.8), t(85) = -4.9, p <
.0001, demonstrating a bias to recall more negative affect.
Participants also retrospectively reported much more smoking
than they had recorded soon after the lapse (recall, M --- 2.80,
SD
= 3.0; ED, M = 0.81 cigarettes,
SD
-- 0.66), t(81) =
-5.89, p < .0001. We considered whether this could be due to
respondents combining smoking from multiple adjacent epi-
sodes in their recall; however, those who experienced multiple
lapses on the day of their first lapse did not differ from those
who reported only one on that day (recall-ED difference for
those with one lapse, M = 1.93 cigarettes,
SD
= 2.96; for
multiple lapses, M = 2.24,
SD
= 3.48), t(80) = .40,
ns.
Systematic Bias Caused by Smoking Status
at Time of Recall
Because experience after an event can color recall, we consid-
ered whether participants' smoking status at the time of recall
298
SHIFFMAN ET AL.
biased their recall lapses. AVE variables were thought to be
particularly vulnerable to recall bias, given that smoking experi-
ence after a lapse bears logically on attributions for the lapse
episode. We examined the relationship between recall bias (i.e.,
recalled vs. recorded AVE values) and smoking status at recall,
operationalized as the number of days of smoking in the 14
days before follow-up. As hypothesized, participants who had
more smoking days exaggerated in retrospect how much the
lapse had made them feel like giving up their quit effort (r =
.30, p < .05); it was not related to bias of recall for any other
individual AVE items (rs < .20, ps > .12).
Discussion
This study evaluated the accuracy and bias of smokers' recall
of a first lapse to smoking. Comparison of recall after a few
months with field recordings of first lapses showed that recall
was generally poor. Across domains, recall correlations averaged
.32; in other words, recall only accurately captured about 10%
of the variance in particular items. Only 10% of correlations
rose above .50. Profile correlations (Cronbach & Gleser, 1953)
assessed whether participants accurately recalled an overall pat-
tern or profile of episode characteristics, even if they could not
recall any particulars very well. The average profile rs were
modest, averaging only .29.
Retrospective reports almost uniformly failed to match field
records reported to have been recorded within minutes of the
lapse episode. Surprisingly, recall was no better for objective,
easily discernable facts (e.g., activity) than it was for subjective
states such as mood. Recall accuracy was poor for the variables
that have been the focus of the clinical and research literature
on relapse episodes (Brownell et al., 1986; Sutton, 1992):
mood, the presence of smoking cues, and the performance of
coping. Only alcohol consumption was recalled with any accu-
racy. This was rather surprising, given that recall of drinking
was expected to be blurred by intoxication at the time and by
state-dependent recall effects (Lister, Eckardt, & Weingartner,
1987). Whereas research has demonstrated reasonable accuracy
of some broad behavioral summaries--such as estimates of
frequencies of high and low drinking days over 6 to 18 months
recalled by time-line follow-back method (Sobell & Sobell,
1992) --we are aware of no studies with more optimistic con-
clusions about accurate recall of specific events. Our findings
raise serious questions about the clinical and scientific use-
fulness of episodic recall data regarding addictions relapse and
other clinical phenomena.
Recall for the date of the target event proved to be wildly
inaccurate: The average error was 14 days, in the context of a
25-day monitoring period; 25% of participants were off by at
least 25 days. This is striking, given that participants were en-
rolled in a structured cessation program with a fixed quit date,
which should have facilitated dating of the lapse. However, it is
consistent with past studies concluding that dates are not accu-
rately recalled (e.g. Brewer, 1988). This finding strongly coun-
sels caution regarding the use of such retrospective dating infor-
mation (e.g., using survival analyses of relapse data). Consistent
with past studies (Thompson, Skowronski, & Lee, 1988), recall
errors were systematic: Participants telescoped the event for-
ward, closer to the time of recall. Consistent with the literature
(see Bradburn et al., 1987), using episode dates as recall cues
proved very ineffective, and actually resulted in less accurate
recall of other details.
We examined several moderators of recall accuracy. As pre-
viously reported by Harsch and Neisser (1989), respondents'
confidence in their recall accuracy was not related to their actual
accuracy. Better educated participants showed better recall of
mood, perhaps reflecting greater introspective focus. Few of the
other variables expected to moderate recall accuracy actually
did so. The length of the recall interval was unrelated to accu-
racy, perhaps because of the very limited range of intervals we
examined. Lapse episodes associated with stronger, more un-
usual emotion were not more memorable (Brewer, 1994), nor
were lapses after longer abstinence more accurately recalled.
The difficulty of recalling details over the long interval may
have masked the effect of event salience. Participants who had
made multiple transitions from abstirfence to smoking on their
current quit effort did not show poorer recall because of confu-
sion among multiple events of the same class. However, partici-
pants who were smoking during the 2 weeks before recall did
demonstrate worse recall of mood and activities surrounding the
initial lapse. Those who were quitting smoking for the first
time demonstrated better recall than those who had tried to quit
multiple times, perhaps because the lapse was more salient for
them.
Respondents' use of schemata--essentially stereotypes of the
target situation--to reconstruct an episodic memory can yield
reasonable, believable but inaccurate accounts of the situation
(Holmberg & Holmes, 1994). To explore whether schematic
reconstruction of smoking lapse situations could account for the
recall data, we compared participants' recall to accounts of
lapse situations solicited from people who have never smoked
and, thus, had no personal basis in experience for their reports
of lapse characteristics. Using the same questionnaire we used
for the recall data, we solicited expectations of "a typical smok-
er's relapse" from 30 adults who had never smoked regularly
(<200 cigarettes in lifetime) and, thus, never experienced a
relapse. Profile rs relating the never-smokers' mean data and
those from our participants' retrospective reports were .67, .57,
.74, and .77 for mood, activities, triggers, and AVE, respectively.
In other words, never-smokers were able to reproduce the pattern
of the retrospective data. (In contrast, their reports less closely
matched the ED data: rs = .01, .59, .61, and .16 for the above
domains, respectively). Thus, the typical pattern of "relapse
data" can be produced without any experiential base at all,
consistent with the idea that participants' recall was influenced
by general beliefs about smoking relapse episodes and account-
ing for the plausibility of participants' inaccurate retrospective
accounts. 5
Although assessments of accuracy were not affected, evalua-
tion of bias through mean differences between real-time data and
recall was made more difficult by the use of different methods of
data collection for real-time and recall data (computer vs. paper
and pencil). Scrolling through multiple response options on the
ED required some effort and may have biased responses toward
5 It may also explain why, if participants' recall is inaccurate, their
recalled overall descriptions of lapse situations roughly match those
obtained from real-time recordings (Shiffman, Paty, Gnys, et al., 1996).
REMEMBER THAT? 299
options that appeared first, thus introducing systematic differ-
ences between ED and recall data. Even with this complication
in mind, the data showed evidence of recall bias. Participants
overstated how upset they felt during the lapse and overstated
how many cigarettes they smoked in that first smoking occasion,
perhaps because most had since relapsed to smoking. 6
Even more pernicious are biases in which participants' subse-
quent smoking distorts their reports of past events. An example
in this study is the tendency for participants who were smoking
more at recall to exaggerate in retrospect how much they felt
like giving up after their first lapse. These participants apparently
inferred their past behavior and feelings from their present be-
havior, thus introducing spurious associations into the data. The
recall data would support the false conclusion that demoraliza-
tion at the time of the initial lapse leads to more subsequent
smoking--precisely the conclusion drawn from retrospective
data in past studies (Curry, Marlatt, & Gordon, 1987; O'Con-
nell & Martin, 1987; also, cf. Shiffman et al., 1996). Thus,
recall errors can systematically create false associations among
theoretically and clinically relevant variables.
Several dements of the present study limit the strength of our
conclusions. ED recordings were, by design, slightly retrospec-
tive: Participants were to record episodes only when they were
over. Participants reported waiting an average of less than 10
rain to record lapses, but they could have been exaggerating
their timeliness. We have no independent confirmation that parti-
cipants actually recorded their first lapse in a timely way. How-
ever, several facts make participants' reports credible. Partici-
pants' reported recording lag was not related to social desirabil-
ity, as would have been expected if some were lying. Also,
participants were dally given explicit opportunity to record
missed episodes later (in an end-of-day assessment); these anal-
yses included only those who explicitly denied omitting a report
of their first lapse. Frequent biochemical testing also provided
a credible deterrent against failure to report lapses (i.e., a bogus
pipeline). ED records were time tagged, so compliance could
not be faked by entering a batch of episodes after the fact.
Participants' compliance with the instruction to record lapses
is also made more credible by their extraordinary compliance
with other aspects of the protocol; for example, participants'
responded to 90% of ED's prompts within 2 min. Thus, although
participants' reports of timely recording cannot be indepen-
dently confirmed, they are fairly credible.
It is also conceivable that discrepancies between real-time
reports and those made 12 weeks later were due to reconsidera-
tion, rather than poor recall. Some aspects of the lapse, such as
triggers and attributions, could become clearer with the perspec-
tive of time. It is also possible that participants might be more
honest about their behavior (e.g., how much they smoked) in
retrospect than they were at the time of the episode, when they
may have been embarrassed to report truthfully. However, these
accounts are robbed of credibility because recall was equally
poor for activities at the time of the lapse, which should not be
subject to such reconsideration.
The generalizability from our study to other recall contexts
is uncertain. We studied recall of one type of event over a narrow
range of recall intervals. Although we found no relationship
between recall interval and accuracy, accuracy is probably better
over shorter intervals. Our findings on smoking lapses may not
apply to recall of unique events of high personal significance.
Although smokers report that quitting smoking is a highly mean-
ingful event (Shiffman, Read, & Jarvik, 1983), most lapses
followed only a few days of abstinence, and many participants
had quit and relapsed several times before, which may have
diminished the significance of the lapses we analyzed. Still, our
study involved recording and recall of a meaningful event by a
community gample, whereas most studies of autobiographical
memory have used college students or professors recalling a
stream of mundane events (e.g. Mingay, Shevell, Bradburn, &
Ramirez, 1994).
It is not clear how participants' continuous monitoring might
have influenced the results. Recording a lapse might fix it in
memory; recording many similar events could interfere with
recall. Previous reports indicate that diary keeping generally has
little effect on recall (e.g., Mingay et al., 1994). Finally, the
fact that a palm-top computer was used to record the original
event, whereas a written questionnaire was used to report recall,
seems to have introduced method effects into our comparisons,
and thus impeded our assessment of bias.
In any case, our analyses suggest that participants cannot
accurately recall the details of a potentially meaningful event
after a period of a few months. The implications for research
and for clinical practice are sobering. Diagnosis and treatment
planning relies heavily on patients' reports of past experiences;
if recall is inaccurate, some of this effort may be misdirected.
The need for accurate real-time data about real-life experiences
is one of the driving forces behind the trend toward Ecological
Momentary Assessment (EMA) methods (Stone & Shiffman,
1994), in which information about small events is recorded in
real time and in people's natural environments. EMA methods
may have clinical assessment applications. Where retrospective
accounts must be used, recall accuracy is likely to be improved
by the use of "cognitive interviewing" methods that are sensi-
tive to the workings of autobiographical memory (Means, Hab-
ina, Swan, & Jack, 1992). Further development of such recall
methods and of real-time data collection should be an urgent
priority for psychological research and practice.
6 In fact, 69% of the participants were smoking on a daily basis at
the time of recall, averaging 18.42 (SD = 9.66) cigarettes per day. The
average total number of cigarettes smoked since the first lapse was 693
(SD = 688.8).
References
American Psychiatric Association. (1994). Diagnostic and statistical
manual of mental disorders (4th ed.). Washington, DC: Author.
Bradburn, N. M., Rips, L. J., & Shevell, S. K. (1987). Answering auto-
biographical questions: The impact of memory and inference on sur-
veys. Science, 236, 157-161.
Brewer, W. E (1988). Memory for randomly sampled autobiographical
events. In U. Neisser & E. Winograd (Eds.), Remembering reconsid-
ered: Ecological and traditional approaches to the study of memory
(pp. 21-90). Cambridge, England: Cambridge University Press.
Brewer, W. E (1994). Autobiographical memory and survey research.
In N. Schwarz & S. Sudman (Eds.), Autobiographical memory and
the validity of retrospective reports (pp. 11-20). New York: Springer-
Verlag.
Brown, N. R., Rips, L. J., & Shevell, S. K. (1985). The subjective dates
300 SHIFFMAN ET AL.
of natural events in very-tong-term memory. Cognitive Psychology,
17, 139-177.
Brownell, K., Marlatt, G. A., Lichtenstein, E., & Wilson, G. T. (1986).
Understanding and preventing relapse. American Psychologist, 41,
765 -782.
Clark, D.M., & Teasdale, J.D. (1982). Diurnal variation in clinical
depression and accessibility of memories of positive and negative
experiences. Journal of Abnormal Psychology, 91, 87-95.
Cronbach, L. J., & Gleser, G. C. (1953). Assessing similarity between
profiles. Psychological Bulletin, 50, 456-473.
Csikszentmihalyi, M., & Larsen, R. (1987). Validity and reliability of
the Experience-sampling method. Journal of Nervous and Mental
Disease, 175, 509-513.
Curry, S., Marlatt, G. A., & Gordon, J. R. (1987). Abstinence violation
effect: Validation of an attributional construct with smoking cessation.
Journal of Consulting and Clinical Psychology, 55, 145-149.
Eisenhower, D., Mathiowetz, N. A., & Morganstein, D. (1991). Recall
error: Sources and bias reduction techniques. In P. P. Biemer, R. M.
Groves, L. E. Lyberg, N. A. Mathiowetz, & S. Sudman (Eds.), Mea-
surement errors in surveys (pp. 127-144). New York: Wiley.
Friedman, W. J. (1993). Memory for the time of past events. Psychologi-
cal Bulletin, 113, 44-66.
Harsch, N., & Neisser, U. ( 1989, November). Substantial and irrevers-
ible errors in flashbulb memories of the Challenger explosion. Poster
presented at the annual meeting of the Psychonomic Society, Atlanta.
Hodgins, D. C., el-Guebaly, N., & Armstrong, S. (1995). Prospective
and retrospective reports of mood states before relapse to substance
use. Journal of Consulting and Clinical Psychology, 63, 400-407.
Holmberg, D., & Holmes, J. G. (1994). Reconstruction of relationship
memories: A mental models approach. In N. Schwarz & S. Sudman
(Eds.), Autobiographical memory and the validity of retrospective
reports (pp. 267-288). New York: Springer-Verlag.
Larsen, R.J., & Diener, E. (1992). Promises and problems with the
circumplex model of emotion. In M. Clark (Ed.), Review of personal-
ity and social psychology (pp. 25-59). Newbury Park, CA: Sage.
Lister, R. G., Eckardt, M. J., & Weingartner, H. (1987). Ethanol intoxi-
cation and memory: Recent developments and new directions. In M.
Galanter (Ed.), Recent developments of alcoholism (pp. 111-126).
New York: Plenum Press.
Marlatt, G. A., & Gordon, J. R. (1985). Relapse prevention. New York:
Guilford Press.
McFall, R. M. (1977). Parameters of self-monitoring. In R. B. Stuart
(Ed.), Behavioral self-management: Strategies, techniques, and out-
come. New York: Brunner/Mazel.
McKay, J. R., O'Farrell, T. J., Maisto, S. A., & Connors, G. J. (1989).
Biases in relapse attributions made by alcoholics and their wives.
Addictive Behaviors, 14, 513-522.
Means, B., Habina, K., Swan, G.E., & Jack, L. (1992). Cognitive
research on response error in survey questions on smoking. (Publica-
tion No. PHS 92-1080). HyattsviUe, MD: U.S. Department of Health
and Human Services.
Meddis, R. (1972). Bipolar factors in mood adjective checklists. British
Journal of Social and Clirtl'cal Psychology, 11, 178-184.
Mingay, D. J., Shevell, S. K., Bradburn, N. M., & Ramirez, C. (1994).
Self and proxy reports of everyday events. In N. Schwarz & S. Sudman
(Eds.), Autobiographical memory and the validity of retrospective
reports (pp. 235-250), New York: Springer-Verlag.
O'Connell, K. A., & Martin, E. J. (1987). Highly tempting situations
associated with abstinence, temporary lapse, and relapse among parti-
cipants in smoking cessation programs. Journal of Consulting and
Clinical Psychology, 55, 367-371.
Ross, M. (1989). Relation of implicit theories to the construction of
personal histories. Psychological Review, 96, 341-357.
Shiffman, S., Hickcox, M., Paty, J. A., Gnys, M., Kassel, J. D., & Rich-
ards, T. (1996). Progression from a smoking lapse to relapse: Predic-
tion from abstinence violation effects and nicotine dependence. Jour-
nal of Consulting and Clinical Psychology, 64, 993-1002.
Shiffman, S., Hickcox, M., Paty, J. A., Gnys, M., Kassel, J. D., & Rich-
ards, T. (in press). The abstinence violation effect following smoking
lapses and temptations. Cognitive Therapy and Research.
Shiffman, S., Hickcox, M., Paty, J. A., Gnys, M., Kassel, J. D., & Rich-
ards, T (1996). Progression from a smoking lapse to relapse: Predic-
tion from abstinence violation effects and nicotine dependence. Jour-
nal of Consulting and Clinical Psychology, 64, 993-1002.
Shiffman, S., Paty, J. A., Gnys, M., Kassel, J. D., & Hickcox, M. (1996).
First lapses to smoking: Within-subjects analysis of real time reports.
Journal of Consulting and Clinical Psychology, 64, 366-379.
Shiffman, S., Paty, J. A., & Kassel, J. D. (1996). Usingpalm-top com-
puters for field assessment of behavior. Manuscript submitted for
publication.
Shiffman, S., Read, L., & Jarvik, M. E. ( 1983, August). The effect of
stressful events on relapse in exsmokers. In S. Shiffman (Chair),
Stress and smoking: Effects on initiation, maintenance, and relapse.
Symposium conducted at the 91st annual convention of the American
Psychological Association, Anaheim, CA.
Sobell, L. C., & Sobell, M. B. (1992). Timeline follow-back: A tech-
nique for assessing self-reported alcohol consumption. In R. Z. Lit-
ten & J. Allen (Eds.), Measuring alcohol consumption: Psychosocial
and biological methods (pp. 41-72). Totowa, NJ: Humana Press.
Stone, A. A., & Shiffman, S. (1994). Ecological momentary assessment
(EMA) in behavioral medicine. Annals of Behavioral Medicine, 16,
199-202.
Sutton, S.R. (1992). Are 'risky' situations really risky? Review and
critique of the situational approach to smoking relapse. Journal of
Smoking-Related Disorders, 3, 79-84.
Teasdale, J. D., & Fogarty, S. J. (1979). Differential effects of induced
mood on retrieval of pleasant and unpleasant events from episodic
memory. Journal of Abnormal Psychology, 88, 248-257.
Thompson, C. P., Skowronski, & Lee, D. J. (1988). Telescoping in dat-
ing naturally occurring events. Memory and Cognition, 16, 461-468.
Wills, T. A., & Shiffman, S. (1985). Coping behavior and its relation
to substance use: A conceptual framework. In S. Shiffman & T. A.
Wills (Eds.), Coping and substance use (pp. 3-24). New York: Aca-
demic Press.
Received November 22, 1995
Revision received March 8, 1996
Accepted June 9, 1996 •