Page 1

Decision makers calibrate behavioral persistence on the basis

of time-interval experience

Joseph T. McGuire⇑, Joseph W. Kable

Department of Psychology, University of Pennsylvania, 3720 Walnut St., Philadelphia, PA 19104, USA

a r t i c l e i n f o

Article history:

Received 21 October 2011

Revised 14 March 2012

Accepted 22 March 2012

Available online 23 April 2012

Keywords:

Decision making

Intertemporal choice

Dynamic inconsistency

Statistical learning

Interval timing

a b s t r a c t

A central question in intertemporal decision making is why people reverse their own past

choices. Someone who initially prefers a long-run outcome might fail to maintain that pref-

erence for long enough to see the outcome realized. Such behavior is usually understood as

reflecting preference instability or self-control failure. However, if a decision maker is

unsure exactly how long an awaited outcome will be delayed, a reversal can constitute

the rational, utility-maximizing course of action. In the present behavioral experiments,

we placed participants in timing environments where persistence toward delayed rewards

was either productive or counterproductive. Our results show that human decision makers

are responsive to statistical timing cues, modulating their level of persistence according to

the distribution of delay durations they encounter. We conclude that temporal expecta-

tions act as a powerful and adaptive influence on people’s tendency to sustain patient

decisions.

? 2012 Elsevier B.V. All rights reserved.

1. Introduction

1.1. Failures of persistence

Intertemporal decision behavior can appear to be

dynamically inconsistent. As Ainslie (1975) framed the

problem, ‘‘people often change their preferences as time

passes, even though they have found out nothing new

about their situation’’ (p. 464). Reversals of choices in do-

mains as diverse and consequential as diet, addiction,

and financial planning create the impression that prefer-

ences are fundamentally unstable. Understanding the

cause of these reversals is important, since a tendency to

sustain the pursuit of delayed rewards correlates with

numerous positive life outcomes (Duckworth & Seligman,

2005; Mischel, Shoda, & Peake, 1988; Shoda, Mischel, &

Peake, 1990).

The predominant theoretical explanations for such

reversals hold that multiple internal subsystems trade off

control over behavior. The relevant subsystems have been

variously characterized as cool vs. hot (Loewenstein,

1996; Metcalfe & Mischel, 1999), controlled vs. automatic

(Baumeister, Bratslavsky, Muraven, & Tice, 1998; Stanovich

& West, 2000), farsighted vs. myopic (Laibson, 1997;

McClure, Laibson, Loewenstein, & Cohen, 2004) or instru-

mental vs. Pavlovian (Dayan, Niv, Seymour, & Daw, 2006).

A related idea is that preference instability can arise from

non-exponential temporal discounting functions (Ainslie,

1975; Laibson, 1997; McClure et al., 2004; Strotz, 1955).

Previous theoretical enterprises have focused largely on

situations where decision makers hold full information

about the times at which future outcomes will occur.

However, the timing of real-world events is not always

so predictable. Decision makers routinely wait for buses,

job offers, weight loss, and other outcomes characterized

by significant temporal uncertainty. Timing uncertainty is

also a central feature of the well-known delay-of-gratifica-

tion paradigm (Mischel & Ebbesen, 1970), where young

children must decide how long to continue waiting for a

0010-0277/$ - see front matter ? 2012 Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.cognition.2012.03.008

⇑Corresponding author. Tel.: +1 215 746 4371; fax: +1 215 898 7301.

E-mail addresses: mcguirej@psych.upenn.edu (J.T. McGuire), kable@

psych.upenn.edu (J.W. Kable).

Cognition 124 (2012) 216–226

Contents lists available at SciVerse ScienceDirect

Cognition

journal homepage: www.elsevier.com/locate/COGNIT

Page 2

preferred food reward, while lacking any information

about how long the delay will last. Even though persis-

tence is usually associated with successful self-control,

temporal uncertainty can create situations where limits

on persistence are appropriate (Dasgupta & Maskin,

2005; Rachlin, 2000). Our aim in the present paper is to

demonstrate that behavior resembling persistence failure

can arise as the rational response to uncertainty about an

awaited outcome’s timing.

1.2. Persistence under temporal uncertainty

A temporally uncertain outcome can be described in

terms of a probability distribution over its potential times

of arrival. Different timing distributions will apply to dif-

ferent categories of events, and the shape of the distribu-

tion determines how the expected remaining delay will

change as time passes. This general phenomenon has been

described previously in the contexts of survival and reli-

ability analysis (e.g., Elandt-Johnson & Johnson, 1980)

and Bayesian cognitive judgment (Griffiths & Tenenbaum,

2006). Here we present an overview focusing on the impli-

cations for intertemporal decision making (for quantitative

details see Fig. 2 and Section 2.3).

If delay durations in a given environment follow a uni-

form or Gaussian distribution, the expected remaining de-

lay will become steadily shorter as time elapses. Gaussian

distributions characterize delimited events, such as movies

or human lifetimes (Griffiths & Tenenbaum, 2006). Con-

sider, for example, the case of waiting for a talk to end. If

it has gone on longer than expected, one might be inclined

to assume that only a small amount of time still remains.

Fig. 1 illustrates this phenomenon for a Gaussian distribu-

tion (specifically, a truncated Gaussian with a lower bound

corresponding to the current time).

Under the standard assumption that rewards are

subjectively discounted as a function of their delay

(Samuelson, 1937), rewards with Gaussian timing will

tend to increase in present subjective value over time

while they are being awaited. If a delayed reward is ini-

tially preferred relative to other alternatives that are avail-

able immediately, this preference should strengthen as

time passes. All else equal, the initial patient choice should

be sustained.

A very different situation can occur if the reward’s tim-

ing follows a heavy-tailed distribution (e.g., a power func-

tion; see Fig. 1). In this case, the expected remaining delay

can increase with the passage of time. Heavy-tailed distri-

butions describe open-ended events, where some delays

are short but others are indefinitely long. Consider the

example of waiting for a reply to an email (Barabási,

2005). One might initially expect a reply to come quickly,

but if it does not, one might conclude that the remaining

delay will be longer than initially expected.

If a reward is characterized by a heavy-tailed timing

distribution, its expected delivery time grows more distant

with time elapsed, implying that its present subjective va-

lue progressively deteriorates. Even if the delayed reward

were initially preferred, it might eventually become so re-

mote that it no longer outcompeted immediately available

alternatives. Under these circumstances, decision makers

could produce reversing sequences of choices, equivalent

to the patterns often attributed to self-control failure: they

might choose a delayed reward, wait for a period of time,

and then shift to an immediate outcome instead. Such a

decision maker would not be dynamically inconsistent,

but would instead be responding rationally to new infor-

mation gained from observing the passage of time. There

is precedent for the idea that mere time passage may be

informative in this way, warranting reassessments of both

the delay and the degree of risk associated with future

events (Dasgupta & Maskin, 2005; Fawcett, McNamara, &

Houston, 2012; Rachlin, 2000; Sozou, 1998).

Heavy-tailed distributions characterize timing in a vari-

ety of real-life situations where intervals are open-ended.

Distributions with this form have been empirically docu-

mented in examinations of the time between emails

(Barabási, 2005), the length of hospital stays (Harrison &

Millard, 1991), and time between retrievals of the same

memory (Anderson & Schooler, 1991). Heavy-tailed distri-

butionsalsoprovideareasonablepriorwhenthetruedistri-

bution is unknown (Gott, 1993, 1994; Jeffreys, 1983). It

seems plausible that decision makers routinely encounter

environments characterized by heavy-tailed timing statis-

tics, in which they must continually reassess whether a for-

merly preferred delayed outcome remains worth pursuing.

Decision makers are also likely to encounter situations

where timing is uncertain but delimited. For example,

endogenous variability in time-interval perception and

memory can produce a Gaussian pattern of subjective

uncertainty (i.e., scalar variability; Gallistel & Gibbon,

2000; Gibbon, 1977). This kind of situation would call for

persistence: if a delayed reward was worth pursuing in

the first place, it should be pursued until it is obtained.

The above observations lead to a hypothesis: a person’s

willingness to continue waiting ought to depend on a

0

0.5

0

0.5

Current time

Expected arrival

0

0.5

0

0.5

05 10

0

0.5

05 10

0

0.5

Probability density

Time (min)Time (min)

Gaussian Heavy-tailed

Fig. 1. Schematic illustration of how time passage may change a reward’s

expected time of arrival. The left and right columns represent different

kinds of beliefs one might hold about an awaited outcome’s timing. The

solid line represents the current time (shown at 0, 2, and 4 min). The

dashed line represents the outcome’s expected arrival time, defined as the

mean of the area to the right of the current time. For Gaussian beliefs

(mean = 3, SD = 1), the expected delay starts at 3 min and grows shorter

with time. For heavy-tailed beliefs (generalized Pareto distribution (see

Eq. (2)), k = 0.5, r = 1.5), the delay starts at 3 min and rises with time.

J.T. McGuire, J.W. Kable/Cognition 124 (2012) 216–226

217

Page 3

dynamically updated estimate of the time at which an

awaited outcome will arrive. This estimate, in turn, should

depend on the applicable timing statistics. Environments

with Gaussian or uniform timing statistics should elicit

strong persistence. In contrast, environments character-

ized by heavy-tailed timing statistics should cause people

to limit how long they are willing to wait.

Existing evidence suggests it is plausible that people

form context-sensitive time estimates and update these

estimates dynamically. Properties of statistical distribu-

tions can be encoded rapidly from direct experience

(Körding & Wolpert, 2004), and processes resembling valid

Bayesian inference support both explicit temporal judg-

ments (Griffiths & Tenenbaum, 2006, 2011; Jazayeri &

Shadlen,2010)and time-dependent

behavior (Balci, Freestone, & Gallistel, 2009; Bateson &

Kacelnik, 1995; Catania & Reynolds, 1968). However, little

evidence as yet bears on the role of temporal inference

during choices that involve waiting for delayed outcomes.

Even though preference reversals may sometimes be theo-

retically rational (Dasgupta & Maskin, 2005; Fawcett et al.,

2012), empirical data to date have been interpreted largely

in terms of limitations on people’s capacity to exert self

control (Baumeister et al., 1998).

reward-seeking

1.3. The present work

Here we seek direct empirical evidence that human

decision makers calibrate their willingness to tolerate de-

lay on the basis of experience with time-interval distribu-

tions. Participants in our first experiment were given

repeated opportunities to wait for randomly timed delayed

rewards, and could decide at any time to stop waiting and

accept a small immediate reward instead. We placed

participants in environments with either uniform or hea-

vy-tailed distributions of time intervals, hypothesizing that

the two conditions would elicit different degrees of will-

ingness to persist.

2. Experiment 1

2.1. Overview

Participants were given a fixed time period to harvest

monetary rewards. They therefore faced a rate-maximiza-

tion objective, akin to a foraging problem. Each reward

took a random length of time to arrive, and participants

could wait for only one reward at a time. At any time they

could quit waiting, receive a small immediate reward, and

continue to a new trial after a short inter-trial interval. De-

lay durations were governed by different probability distri-

butions in two groups of participants.

One group experienced a uniform distribution (UD; see

Fig. 2A), spanning 0–12 s. The expected remaining delay

declined over time, and the reward-maximizing strategy

was always to continue waiting (see Section 2.3). To under-

stand this intuitively, consider a decision maker who has

already waited 6 s. The delayed reward is guaranteed to ar-

rive within the next 6 s, and is therefore an even better

prospect than initially, when it was guaranteed to arrive

within 12 s. If the delayed reward was preferred at the out-

set, it should be preferred by a still greater margin after

some time has passed.

The second group experienced a truncated heavy-tailed

distribution of delays (HTD group; see Fig. 2A). Here the

expected remaining delay initially increased with time

waited. The maximizing strategy called for quitting when-

ever the reward failed to arrive within the first few seconds

(for details, see Section 2.3). If participants calibrate persis-

tence adaptively, they should exhibit greater persistence in

the UD condition than the HTD condition.

2.2. Methods

2.2.1. Participants

Participants were recruited in a New Jersey shopping

mall (n = 40; 23 female), age 18–64 (mean = 32), with

11–20 years of education (mean = 15). Each participant

was randomly assigned to either the UD or HTD condition

(n = 20 each). The proportion female was 10/20 in the UD

group and 13/20 in the HTD group. The two groups did

not significantly differ with respect to age (UD group med-

ian = 25, interquartile range [IQR] = 20.5–48.5; HTD group

median = 24.5, IQR = 22.5–44.5; Mann–Whitney U = 194,

nUD= 20, nHTD= 20, p = 0.88) or years of education (based

on the 35 participants who reported their level of educa-

tion; UD group median = 15, IQR = 12–16; HTD group med-

ian = 15, IQR = 14–16; Mann–Whitney U = 137, nUD= 17,

nHTD= 18, p = 0.61).

Assignment to conditions was automated and con-

cealed from the experimenter, and all participants received

identical instructions. Participants were informed that

they could expect to make $5–10 depending on perfor-

mance, but were not told anything about the distribution

of possible delay times. In both experiments, procedures

for testing human subjects were approved by the applica-

ble institutional review board.

2.2.2. Materials and procedure

The task was programmed using the Psychophysics

Toolbox (Brainard, 1997; Pelli, 1997) extensions for Matlab

(The MathWorks, Natick, MA). Fig. 3 shows the interface. A

yellow light would stay lit for a random duration before

delivering a 15¢ reward. Participants could choose to wait

by leaving the mouse cursor in a box marked, ‘‘Wait for

15¢.’’ Alternatively, by shifting to a box marked ‘‘Take

1¢,’’ participants could receive 1¢ and proceed to a new

trial. Each outcome (15¢ or 1¢) was followed by a 2-s in-

ter-trial interval (ITI). The cursor could remain in either

box across multiple trials. The task duration was 10 min,

and the screen continuously displayed the time remaining

and total earned. Final compensation was rounded up to

the next 25¢.

Delays varied randomly from trial to trial, and were

scheduled according to a different distribution in each con-

dition (see Fig. 2A). The large reward was delivered at the

end of the scheduled delay on each trial unless the partic-

ipant chose to take the small reward earlier. For the UD

group, delays were drawn from a continuous uniform dis-

tribution described by the following cumulative distribu-

tion function:

218

J.T. McGuire, J.W. Kable/Cognition 124 (2012) 216–226

Page 4

FUnifðtÞ ¼

0

t?a

b?a

1

for t 6 a

for a < t < b

for t P b

8

>

>

:

<

ð1Þ

Parameters were a = 0 and b = 12 s, so quartile upper

boundaries fell at 3, 6, 9, and 12 s.

For the heavy-tailed distribution we used a truncated

generalized Pareto distribution. An unbounded generalized

Pareto distribution has the following cumulative distribu-

tion function:

FGPðtÞ ¼ 1 ?

1 þkt

r

???1=k

ð2Þ

Note that Eq. (2) omits the location parameter h, which

we set to zero, implying that zero is the shortest possible

delay. Applying an upper bound T gives the following

cumulative distribution function for a truncated general-

ized Pareto:

(

FGP-truncðtÞ ¼

FGPðtÞ

FGPðTÞ

1

for t 6 T

for t > T

ð3Þ

We used parameters k = 8, r = 3.4, and T = 90 s, which

set quartile upper boundaries at 0.78, 3.56, 15.88, and 90 s.

We wished to ensure that even short spans of experi-

ence would be representative of the underlying distribu-

tion. To accomplish this, delays were not drawn fully

randomly on each trial, but were sampled from each quar-

tile in random order before a quartile was repeated. This

05 1015 20

0

0.2

0.4

0.6

0.8

1

Delay length (sec)

Cumul. probability

05 10 1520

0

0.2

0.4

0.6

0.8

1

Delay length (sec)

Cumul. probability

05 101520

0

3

6

9

12

Waiting policy (sec)

Return ($)

0510

1520

0

3

6

9

12

Waiting policy (sec)

Return ($)

A

DC

B

UD groupHTD group

BD group

Experiment 1

Experiment 2

Fig. 2. Delay distributions and the resulting payoff functions in each experiment. (A) Cumulative probability of the large reward arriving after a given delay

length, for each condition of Experiment 1. Delays in the UD group followed a uniform distribution (lower bound a = 0, upper bound b = 12). Delays in the

HTD group followed a generalized Pareto distribution (shape k = 8, scale r = 3.4, location h = 0) truncated at 90 s. (B) Expected total monetary return under a

range of waiting policies for Experiment 1. A waiting policy is defined by the time at which a subject would give up waiting if the reward had not yet been

delivered. Arrows mark the optimal waiting policy in each condition. (C) Cumulative probability distributions for delay lengths in Experiment 2. The UD

group received a uniform distribution (a = 0, b = 12), the HTD group received a truncated generalized Pareto distribution (k = 4, r = 5.75, h = 0, truncated at

60 s), and the BD group received a scaled beta distribution (shape parameters a = 0.25, b = 0.25, scaling factor of 12). (D) Expected total monetary return

under a range of waiting policies for Experiment 2. Arrows mark each condition’s reward-maximizing policy.

Take 1¢Wait for 15¢

Time left: 09:51

Amount won: $0.15

Fig. 3. Interface for the behavioral choice task used in both experiments.

J.T. McGuire, J.W. Kable/Cognition 124 (2012) 216–226

219

Page 5

approach has the disadvantage of introducing subtle

sequential structure, but the important advantage of

reducing within-condition variability in the timing statis-

tics participants experienced.

Two demonstration trials preceded the main task. On

the first, participants were instructed to wait for the large

reward, which arrived after 5 s. On the second, participants

were instructed to take the small reward.

2.3. Normative analysis

We define a waiting policy as the time at which a deci-

sion maker will give up waiting on each trial if the large re-

ward has not yet arrived. The expected return for a policy

of quitting at time t may be calculated as follows. Let ptbe

the proportion of rewards delivered earlier than t. Let stbe

the mean duration of these rewarded trials. One trial’s ex-

pected return, in dollars, is Rt= 0.15(pt) + 0.01(1 ? pt). Its

expected cost,in seconds,

(including the 2-s ITI). The expected return for policy t over

the 600-s experiment is 600 ? Rt/Ct. This is the quantity

participants should seek to maximize.

For each condition, we calculated the expected return

for a grid of waiting policies spaced every 0.01 s from 0

to 20 s. For policy t, the large-reward probability ptis sim-

ply the value of the cumulative probability distribution

function at t (see Eqs. (1)–(3)). The value of stis easy to cal-

culate in the UD condition: st= t/2. For the HTD condition

the calculation ofstis more complex, though still tractable;

in practice, we estimated stby taking the mean of 100,000

random samples from the distribution between 0 and t.

Fig. 2B shows the expected monetary return for a range

of waiting policies. At one extreme, a policy of quitting

every trial immediately yields 1¢ every 2 s, for $3.00 total

(in either condition). At the other extreme, complete per-

sistence in the UD condition would yield 15¢ every 8 s on

average, for $11.25. Complete persistence in the HTD con-

dition would yield poorer results, with a large reward

occurring approximately every 15 s on average, leading

to an expected return of $6.00. The best-performing policy

in the HTD condition is to quit if the reward has not arrived

after 2.13 s; this yields an expected return of $11.43. A par-

ticipant who perfectly implemented this policy would ob-

tain the large reward on 41% of trials, with an average

delay on these trials of 725 ms. On the remaining trials

the small reward would be selected after a wait of 2.13 s.

is

Ct= st(pt) + t(1 ? pt) + 2

2.4. Data analyses

Individual trials differ in the amount of information

they provide regarding a participant’s waiting policy. Quit

trials are the most informative, as they offer a direct esti-

mate of the limit on an individual’s willingness to persist.

When a reward is delivered, however, we observe only that

the person was willing to wait at least the duration of the

trial. We accommodate this situation using statistical

methods from survival analysis. Analyses assessed how

long a trial would ‘‘survive’’ without the participant quit-

ting. Rewarded trials were considered right-censored,

analogous to patients who drop out of a clinical study

and yield only a lower bound on their survival.

We constructed a Kaplan–Meier survival curve on the

basis of each participant’s responses. The Kaplan–Meier is

a nonparametric estimator of the survival function (Kaplan

& Meier, 1958). For each time t, it plots the participant’s

probability of waiting at least until t if the reward is not

delivered earlier. Analyses were restricted to the 0–11 s

interval for which we have observations in both condi-

tions. (Note that we can only observe an individual’s will-

ingness to wait t seconds if we have trials where the

scheduled delay equals or exceeds t.) The area under the

survival curve (AUC) is a useful summary statistic, repre-

senting the average number of seconds an individual was

willing to wait within the analyzed interval. Someone

who never quit earlier than 11 s would have an AUC of

11. One who was willing to wait up to 3 s on half the trials

and up to 9 s on the other half would have an AUC of 6.

Fig. 4. Results of Experiment 1. (A) Mean survival curves, with standard

error of the mean (SEM), sampled at 1-s intervals, reflecting participants’

willingness to wait in each condition. (B) Area under the survival curve

(AUC) for each individual, calculated over the range from 0 to 11 s. Values

reflect how long each individual was willing to wait within the first 11 s

of the delay period. (C) Mean estimated willingness to wait (WTW), as a

function of time elapsed in the experimental session (with SEM). Arrows

mark the reward-maximizing policies. (D) Mean survival curves (with

SEM) restricted to trials in which participants waited at least 1 s. (E) Each

individual’s AUC value for the analysis in Panel D.

220

J.T. McGuire, J.W. Kable/Cognition 124 (2012) 216–226