ArticlePDF Available

Abstract and Figures

Precommitment, or taking away a future choice from oneself, is a mechanism for overcoming impulsivity. Here we review recent work suggesting that precommitment can be best explained through a distributed decision-making system with multiple discounting rates. This model makes specific predictions about precommitment behavior and is especially interesting in light of the emerging multiple-systems view of decision-making, in which functional systems with distinct neural substrates use different computational strategies to optimize decisions. Given the growing consensus that impulsivity constitutes a common point of breakdown in decision-making processes, with common neural and computational mechanisms across multiple psychiatric disorders, it is useful to translate precommitment into the common language of temporal difference reinforcement learning that unites many of these behavioral and neural data.
This content is subject to copyright.
Frontiers in Neuroscience www.frontiersin.org October 2012 | Volume 6 | Article 138 | 1
FOCUSED REVIEW
published: 08 October 2012
doi: 10.3389/fnins.2012.00138
Don’t let me do that! – models of
precommitment
Zeb Kurth-Nelson1 and A. David Redish 2*
1 Wellcome Trust Centre for Neuroimaging, University College London, London, UK
2 Department of Neuroscience, University of Minnesot a, Minneapolis, MN, USA
Precommitment, or taking away a future choice from oneself, is a mechanism for overcoming
impulsivity. Here we review recent work suggesting that precommitment can be best explained
through a distributed decision-making system with multiple discounting rates. This model
makes specific predictions about precommitment behavior and is especially interesting in light
of the emerging multiple-systems view of decision-making, in which functional systems with
distinct neural substrates use different computational strategies to optimize decisions. Given
the growing consensus that impulsivity constitutes a common point of breakdown in decision-
making processes, with common neural and computational mechanisms across multiple
psychiatric disorders, it is useful to translate precommitment into the common language of
temporal difference reinforcement learning that unites many of these behavioral and neural data.
Keywords: discounting function, decision-making, neuroeconomics, temporal diference reinforcement learning,
precommitment
Edited by:
Daeyeol Lee, Yale University School of
Medicine, USA
Reviewed by:
Christian C. Luhmann, Stony Brook
University, USA
Xinying Cai, Washington University in St
Louis, USA
Zeb L. Kurth-Nelson is a postdoc at the
Wellcome Trust Centre for Neuroimaging
at University College London. He
received his Ph.D. in Neuroscience from
the University of Minnesota in 2009. His
research interests concern the neural
substrates of decision-making and the
dysfunction of decision-making in
psychiatric disorders such as addiction.
z.kurth-nelson@ucl.ac.uk
It seems illogical on the surface, but humans
and other animals sometimes put themselves
in situations to prevent themselves from being
given an option that they would choose if given
the chance. They will even expend effort and
cost to avoid being given the future option.
Such restriction of one’s own future choices
is called precommitment. It is theorized that
precommitment occurs because humans and
other animals have different preferences at
different times (Strotz, 1955; Ainslie, 1992).
Precommitment behaviors take many forms,
ranging from purely external mechanisms like
flushing cigarettes down the toilet, to purely
internal mechanisms like making a promise to
oneself that one is unwilling to break, to inter-
mediate mechanisms like making a public state-
ment about one’s intentions.
Precommitment is ubiquitous in human
behavior.Christmas Clubs, popularized during
the Great Depression, enforced saving through the
year for Christmas shopping (Strotz, 1955). In the
modern era, websites like stickk.com automatically
transfer money from a credit card to a designated
recipient (such as a charity) if the user fails to meet
a specified goal (as reported by a trusted third
party). In Australia, Canada, and Norway, many
gambling machines require the gambler to pre-
set a limit on his or her expenditure, after which
the machine deactivates (Ladouceur et al., 2012).
(Some gamblers also spontaneously create their
own precommitment strategies, Wohl et al., 2008;
Ladouceur et al., 2012.) In day-to-day experience,
people place the ice cream out of sight, put money
into a retirement account with withdrawal penal-
ties, walk a different route to avoid seeing a store
where there is temptation to buy something, or
self-impose deadlines with self-imposed punish-
ments (Ariely and Wertenbroch, 2002).
Precommitment behavior has been demon-
strated in animals (Rachlin and Green, 1972;
Ainslie, 1974), but there is not yet an established
laboratory paradigm for eliciting precommitment
behavior in humans. Although precommitment
Kurth-Nelson and Redish Models of precommitment
Frontiers in Neuroscience www.frontiersin.org October 2012 | Volume 6 | Article 138 | 2
can be predicted to occur as a direct consequence
of time-dependent changes in preference order
(Ainslie, 1992), explicit neural and computational
models of precommitment remain limited. In
our paper,A reinforcement learning model of
precommitment in decision-making” (Kurth-
Nelson and Redish, 2010), we examined whether
current computational models of decision-mak-
ing can explain precommitment and what those
models imply for the mechanisms that underlie
precommitment. Here, we will focus on inte-
grating those results into the broader picture of
decision-making.
Valuation and discounting
Psychologists and economists (and now, neuro-
economists) operationalize the decision-making
process through the framework of valuation.
Whenever an organism (which we will call an
“agent” here, to allow for easy translation between
simulations and real organisms) is faced with a
choice, each possible outcome is assigned a value.
These values are compared, and the outcomes
with higher values are more likely to be chosen
(Glimcher, 2008). Although there are additional
action-selection systems which do not work this
way (such as reflexes), there is a compelling body
of evidence that valuation plays a role in the mak-
ing of many choices. Neural correlates of value-
based decision-making have been identified in
many parts of the brain (Rangel et al., 2008; Kable
and Glimcher, 2009).
Rewards become less valued as they are more
delayed – a phenomenon known as temporal or
delay discounting. A discounting function is a
quantitative description of this decay in value
(Ainslie, 1992; Mazur, 1997; Madden and Bickel,
2010). The discounting function of an individ-
ual human subject can be measured empirically
with a series of questions (for example, “Would
you prefer $30 today or $100 in a year?”), and is
generally stable over time (Ohmura et al., 2006;
Takahashi et al., 2007; Jimura et al., 2011).
The simplest discounting function is one that
decays exponentially. In exponential discounting,
each unit of delay reduces value by the same per-
centage. However, when measured empirically,
the discounting functions of humans and ani-
mals are not exponential (Ainslie, 1992; Madden
and Bickel, 2010). Instead, they are steeper than
exponential at short delays, and shallower than
exponential at long delays (Figure 1). Hyperbolic
functions are often used to fit these curves, but
for our purposes it is not critical whether the
shape is actually hyperbolic; only that it is more
concave than exponential. All non-exponential
functions show preference reversals (Strotz,
1955; Frederick et al., 2002) – an option preferred
today is not necessarily preferred tomorrow.
Precommitment can be explained as a con-
sequence of preference reversal (Ainslie, 1992;
Kurzban, 2010).; (Figure 1). Fundamentally,
precommitment entails both the preference at
one time for a smaller reward available sooner
(smaller-sooner, SS) over a larger reward that
one must wait for (larger-later, LL) – and also the
preference at an earlier time for LL over SS. In the
diagram of Figure 1A, an agent with a preference
reversal would prefer SS over LL in situation C,
but LL over SS in situation P. This means that in
situation P the agent has an incentive to prevent
itself from reaching choice C, and to instead go
to situation N, in which it has no choice – thereby
precommitting to LL.
computational models of
precommitment
Temporal difference reinforcement learning
(TDRL) is often used to bridge the gap between
descriptive theoretical models of decision-making
and their neural implementation. Because of its
biological plausibility, guaranteed convergence,
and power to explain behavior and neural activ-
ity (Schultz et al., 1997; Sutton and Barto, 1998;
Roesch et al., 2012), TDRL has become a well-
established model of value-based decision-mak-
ing (Montague et al., 1996; Schultz, 1998).
TDRL assumes that an agent can take actions,
some of which are rewarded. The goal is to
learn to take actions that maximize the reward
received (Sutton and Barto, 1998). Distinct sit-
uations of the world are represented as states.
TDRL aims to estimate the value of each state,
which is defined as the total discounted future
reward expected from that state. This is a recur-
sive definition: the value of a state can be defined
as the discounted value of the next state plus
the reward available in the next state (Bellman,
1957). To learn these values, on every state tran-
sition, TDRL calculates the difference between
the discounted value of the new state (plus the
reward received if any) and the value of the old
state. This difference defines a prediction error in
the value estimation. When this prediction error,
scaled by a learning rate, is added to the value
estimate, the estimated value is brought closer
to the true value. Under appropriate conditions
(a stable world, complete exploration, etc.), the
value function will converge to the true value
function, and, once the values associated with
each state are learned, optimal behavior can be
achieved by selecting the available action lead-
ing to the highest-value state. Although the basic
model of TDRL is incomplete (Niv et al., 2006;
*Correspondence:
A. David Redish is currently a professor
in the Department of Neuroscience at the
University of Minnesota. He has been at
the University of Minnesota since 2000,
where his lab studies decision-making,
particularly issues of covert cognition in
rats and failures of decision-making
systems in humans.
redish@umn.edu
Precommitment
Taking away a choice from one’s future
self in order to enforce one’s present
preferences.
Delay discounting
The attenuation in subjective value of
rewards that will be delivered in the
future. Delay discounting is typically
measured by posing decisions between
smaller immediate rewards and larger
delayed rewards. Subjects with steep
discounting will demand a large
increase in the magnitude of a reward
in order to tolerate a delay in its receipt.
Preference reversal
An instability in preferences over time,
such that at one time, X is preferred
over Y, but at another time, Y is
preferred over X. Preference reversal is
central to impulsivity disorders. For
example, drugs are rarely preferred over
healthy choices when the choice is
viewed from a distance, but often
preferred when immediately available.
Therefore it is critical to have
mechanisms to enforce the healthy
preferences.
Temporal difference reinforcement
learning (TDRL)
A standard computational framework
that helps to explain behavioral and
neural data. TDRL works by calculating
a prediction error at each time step,
which encodes the difference between
expected and actual reward. This
prediction error is used to update
expectations such that future prediction
errors are minimized. A signal
resembling this prediction error is
coded by midbrain dopamine neurons.
Kurth-Nelson and Redish Models of precommitment
Frontiers in Neuroscience www.frontiersin.org October 2012 | Volume 6 | Article 138 | 3
O’Doherty, 2012; van der Meer et al., 2012), it
remains the starting point for computational
models of decision-making.
In the standard implementation of TDRL,
there is a state transition on every time step.
Exponential discounting can therefore be calcu-
lated very straightforwardly by taking the value of
the current state to be the value of the next state
(plus the reward received if any) times a constant
γ (0 < γ < 1). In this formulation, each unit of time
causes the same attenuation of value, which is the
definition of exponential discounting. However,
non-exponential discounting has been difficult
to implement in TDRL. There have been a hand-
ful of attempts at performing non-exponential
(specifically, hyperbolic) discounting within a
TDRL model (Daw, 2000, 2003; Kurth-Nelson
and Redish, 2009; Alexander and Brown, 2010).
We examined precommitment behavior in these
four TDRL models (Kurth-Nelson and Redish,
2010).
We found that three of these four models
produced hyperbolic discounting only in special
cases (either across a single state transition, or in
an environment with no choices) and therefore
were unable to produce precommitment. The
other model produced hyperbolic discounting
in arbitrary state-spaces and was able to produce
precommitment. The successfully precommitting
model was the μAgents model that we introduced
in 2009 – in this model, a set of exponentially
discounting TDRL agents operating in paral-
lel, each with a different discounting rate, and
each maintaining its own estimate of the value
function, collectively approximate hyperbolic
discounting behavior (Figure 2; Kurth-Nelson
and Redish, 2009). By using a distributed repre-
sentation of value, the μAgents model can track
hyperbolic discounting across multiple state
transitions. The distributed representation of
value used by the μAgents model can represent
more than just the mean expected value of a given
state. This allows the μAgents model to discount
hyperbolically across multiple state transitions,
which enables preference reversal and therefore
precommitment.
A TDRL model of precommitment gives us a
concrete computational hypothesis with which
to explore potential mechanisms by which people
choose to precommit. More generally, it is also
important to have computational models that
describe choice in complex state-spaces (Kurth-
Exponential Hyperbolic
BC
time of SS
time of LL
time of C
D
S
D
L
time of P
D
C
time of SS
time of LL
time of C
D
S
D
L
time of P
D
C
time
value
R
L
R
L
R
S
D
L
D
L
D
S
D
C
D
C
C
N
P
LL
LL
SS
A
FIGURE 1 | Precommitment arises from hyperbolic but not exponential discounting. (A) A state-space for
precommitment (from Kurth-Nelson and Redish, 2010). The agent first chooses whether to enter state C or state N. From
state C, a standard intertemporal choice is available, between a larger reward available later (LL) and a smaller reward
available sooner (SS). This choice is outlined with a dashed box. But from state N, only LL is available. Thus choosing N
represents precommitment. (B) In exponential discounting, values decay by the same percentage for each unit of delay,
so if SS is preferred at state C, it must also be preferred at state N. (C), In hyperbolic discounting, values decay more
steeply proximally to the outcome, so it is possible for SS to be preferred at state C, but for LL to be preferred at state N.
Kurth-Nelson and Redish Models of precommitment
Frontiers in Neuroscience www.frontiersin.org October 2012 | Volume 6 | Article 138 | 4
mitment (Ainslie, 1992, 2001) has been noted
for decades, several non-intuitive consequences
appeared when precommitment was imple-
mented in a computational model.
First, the theoretical model predicts that pre-
commitment is increased when there is a larger
contrast between the SS and LL options. In other
words, precommitment will be more favored if
LL is very large and very delayed, compared to SS
(of course, if LL is very large but not very delayed,
then it will simply be preferred over SS at any time
point, and precommitment will not be required).
This suggests that, in the case of addiction, if we
want to encourage precommitment, it is impor-
tant to define the perceived alternative to drug use
as being a major outcome, such as the long-term
health and safety of oneself or family members
(Heyman, 2009). It is less likely that people would
spontaneously precommit if the only perceived
alternative to drug use were a modest outcome
such as saving the money one would have spent
on the drugs. Recent work on contingency man-
agement (CM, in which a concrete alternative
is offered to remain abstinent from drugs) sug-
gests that the most effective CM procedures entail
working toward a very large concrete reward far in
the future (such as a big-screen television; Petry,
2012).
Nelson and Redish, 2012). For example, the same
model that allows the analysis of precommitment
can also be used to analyze bundling. Bundling is
another strategy that may be used to overcome
an impulsive discounting function (Ainslie and
Monterosso, 2003), but unlike precommitment,
bundling does not require advance preparation.
In bundling, choices are treated as categorical.
For example, rather than thinking “Do I want
to smoke one cigarette?” one would think, “Do I
want to smoke cigarettes?” Non-exponential dis-
counters will often say yes to the former question
and no to the latter, at the same time (Rick and
Loewenstein, 2008). This dichotomy suggests a
multi-faceted value function, such that differ-
ent components of the valuation process lead
to different answers and internal conflict which
needs to be resolved before an action can be
taken (Kurzban, 2010; van der Meer et al., 2012;
Wunderlich et al., 2012).
predictions about precommitment
behaVior
Computational models allow exploration of
parameter spaces. Although the fact that non-
exponential (e.g., hyperbolic) discounting leads
to preference reversals (Strotz, 1955; Frederick
et al., 2002) and to the potential for precom-
standard TD model
TD model with
distributed discounting
B
0 5 10 15 20
0
0.2
0.4
0.6
0.8
1
delay
discounted value
A
time of C
time of P
SS
LL
SS
LL
time
value
FIGURE 2 | Distributed discounting enables precommitment in temporal difference learning. (A) Twenty
exponential curves with discounting rates spread uniformly between 0 and 1 are shown in black. The average of these
curves is shown in red. This average curve closely approximates a hyperbolic function. (B) Standard TD models cannot
precommit because, at each state transition, discounting starts over, ensuring that if SS is preferred over LL at the time of
C, then it is also preferred at the time of P (top pair of curves). When averaging a set of exponential discount curves,
discounting is not reset at each state transition, so preferences can reverse between C and P (bottom pair of curves).
Kurth-Nelson and Redish Models of precommitment
Frontiers in Neuroscience www.frontiersin.org October 2012 | Volume 6 | Article 138 | 5
multiple systems
As noted above, TDRL models are incomplete
descriptions of the full range of animal (includ-
ing human) behavior (O’Doherty, 2012). Recent
work suggests that there are at least three behav-
ioral controllers functioning in tandem: habitual,
deliberative, and Pavlovian (Daw et al., 2005,
2011; Dayan et al., 2006; Redish et al., 2008;
Fermin et al., 2010; Glascher et al., 2010; Simon
and Daw, 2011; Huys et al., 2012; van der Meer
et al., 2012; Wunderlich et al., 2012). This leads to
the multiple-systems theory of decision-making,
which says that multiple decision-making con-
trollers interact to make decisions. Habitual
decision-making entails incremental learning
of inflexible stimulus-action relationships that
are released upon exposure to certain stimuli;
deliberative decision-making entails search and
evaluation through a representation of the causal
structure of the world; and Pavlovian decision-
making entails the release of species-specific
approach and avoidance reactions in response to
unconditioned or conditioned stimuli. TDRL is
generally taken to be a model of habitual behavior.
There are two basic possibilities for how pref-
erence reversals, and therefore precommitment,
arise within the context of these multiple systems.
The first possibility is that preference reversals
are inherent within a single instrumental system.
For example, precommitment may arise entirely
within the habitual system as a consequence of
multiple exponential discount rates operating in
parallel. In this case, precommitment would exist
even without an interaction between multiple sys-
tems, and would occur without conscious antici-
pation of a preference reversal; it would occur
entirely as a consequence of differential reinforce-
ment (Ainslie, 1974).
The second possibility is that preference
reversals stem from interactions between systems
(Bechara et al., 1998; McClure et al., 2004; Dayan
et al., 2006; Haidt, 2006; Kurzban, 2010). For
example, the deliberative system may discount
exponentially, such that LL is preferred from C
within the deliberative system; but when faced
with an imminent choice of SS, the Pavlovian
system adds to the total value of SS such that it
is ultimately chosen. From the vantage of P, SS
is not imminent, so the Pavlovian approach is
absent and the deliberative system can choose
N without hindrance. Thus there is an apparent
reversal of preference. Reversal could also arise
from an interaction between the habitual and
deliberative systems. Suppose that the habitual
system has a faster discounting rate than the
deliberative system and that it dominates at state
C (where there is less uncertainty, Daw et al.,
Second, we can predict that there is a com-
plex effect of an agent’s discounting rate on their
ability to precommit. When an agent is highly
impulsive (fast discounting rate), it will be highly
sensitive to the delay between precommitment
and choice. If this delay is small, precommit-
ment is unfavorable, but as this delay increases,
the preference for precommitment increases
steeply. On the other hand, if an agent is rela-
tively patient (slow-discounting rate), then it
will be largely insensitive to the delay between
precommitment and choice, exhibiting at best
a mild preference for precommitment for any
value of this delay. Thus, the highest overall pref-
erence for precommitment appears in the most
impulsive agents. On the surface this appears a
bit paradoxical: the people with the strongest
preference for an impulsive choice are the ones
most likely to employ a strategy that curtails
their ability to reach it. However, this finding
suggests that in addiction, treatment strategies
should be tailored to the individual depend-
ing on his or her own discounting rate. For
fast discounters, inserting more time between
precommitment and choice is essential – while
for slow discounters, the theory predicts that it
won’t make much of a difference. In fact, for
slow-discounting addicts, precommitment may
not be a useful strategy at all.
Third, the model predicts that precommit-
ment is highly sensitive to the precise shape of
an agent’s discounting function (Figure 3). Our
theoretical analysis reveals that two discounting
functions that are both fit by nearly identical
hyperbolic parameters can exhibit entirely dif-
ferent patterns of precommitment behavior. In
particular, the simulations in Kurth-Nelson and
Redish (2010) illustrated that precommitment
depends on the shape of the tail of the discount-
ing function. When the tail of the discounting
function is slightly depressed, precommitment
behavior can be abolished for some ranges of
reward magnitudes and delays. This finding
indicates that beyond tailoring treatment to
an individual’s best-fit discounting rate, it may
provide further therapeutic power to design
behavioral interventions most likely to work
with the shape of the individual’s discount-
ing function. Additionally, any treatments
that modulate the shape of the discounting
function may produce large effects on pre-
commitment behavior. For example, boosting
serotonin appears to preferentially select slow-
discounting components (Tanaka et al., 2007;
Schweighofer et al., 2008), which should boost
the tail of the discounting curve and improve
precommitment.
Multiple-systems theory of
decision-making
Machine learning research shows that
there are different computational
approaches to solving the problem of
producing behavior that maximizes
reward. Neural recordings suggest that
each of these different algorithms are
implemented in the brain, in distinct
but overlapping areas.
Kurth-Nelson and Redish Models of precommitment
Frontiers in Neuroscience www.frontiersin.org October 2012 | Volume 6 | Article 138 | 6
systems. In the first case, the TDRL model
describes precommitment within the instru-
mental habit learning system and is agnostic
to the interaction of this system with other
decision-making systems. In the second case,
the model illustrates the general principle that
multiple simultaneous processes with different
effective discounting rates produce precom-
mitment. These processes may be a mixture of
goal-directed and habitual systems, or a mix-
ture of instrumental and Pavlovian systems.
In the second case, the model’s prediction that
precommitment is sensitive to the exact shape
of the effective discount curve implies that pre-
commitment is sensitive to the exact interplay
between systems. Particularly intriguing is the
role of the deliberative system in shaping pre-
commitment. The deliberative system entails
searching through future possibilities, which
suggests that decisions are strongly influenced
by the cognitive process of search, and by the
2005). Meanwhile, the deliberative system, with
a slower discounting rate, dominates at state P.
The transition from deliberative to habitual con-
trol between P and C would lead to an observed
preference reversal.
In other words, the deliberative system would
have insight into the expected future impulsive
choice of the habitual system, and would choose
to take an action leading to a situation where the
habitual or Pavlovian system would not have the
impulsive action available. Interestingly, explicit
insight or cognitive recognition of future impul-
sivity is sometimes assumed to be necessary
for precommitment (Baumeister et al., 1994;
Kurzban, 2010; Baumeister and Tierney, 2011),
but the extent to which precommitment depends
on insight is unknown at this time.
These two possibilities suggest different
ways in which our model of precommitment
(Kurth-Nelson and Redish, 2010) fits into the
broader context of multiple decision-making
relative value
Subject 1 (ln K = 0.06) Subject 2 (ln K = 0.46)
1
7
14
30
183
365
$0
$100
$200
$300
$400
$500
$600
immediate amount equivalent
to delayed $1000
delay (days)
1
7
14
30
183
365
delay (days)
relative value
SS
LL
Choice
Precommit
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
SS
LL
Choice
Precommit
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
A
B
Data
Best-fit hyperbolic
$0
$100
$200
$300
$400
$500
$600
FIGURE 3 | Shape of discounting curve strongly influences precommitment. (A) The actual discounting curves of
two individuals are shown in solid lines, and the best-fit hyperbolic curves are shown in dashed lines. These two subjects
were both fit by a hyperbolic function with ln(K) of approximately 0 (from a range of 13 to +4 across subjects). (B)
Predicted precommitment behavior, based on actual discounting curve shape of each subject, using the following
parameters: DC = 6 days, DL = 1 day, DS = 0, RL = $150, RS = $100. Subject 1 is expected to have a modest preference for
SS over LL, and to be averse to precommitment. Meanwhile, subject 2 is expected to have a strong preference for SS
over LL, but to favor precommitment. (Data from Chopra et al., 2009, used with permission.)
Impulsivity
Impulsivity can refer to the inability to
inhibit ongoing actions, inability to
stick with a long-term plan, or
unwillingness to make effort or wait to
get a reward. Each of these phenomena
reflects a lack of top-down or executive
control. In this paper, we focus on
unwillingness to wait for delayed
rewards.
Kurth-Nelson and Redish Models of precommitment
Frontiers in Neuroscience www.frontiersin.org October 2012 | Volume 6 | Article 138 | 7
representations of those possibilities (Kurth-
Nelson et al., 2012). If this deliberative role in
precommitment is in fact the case, then pre-
commitment is likely to have a complex inter-
action with cognitive processes like working
memory.
computational psychiatry
Psychiatry is the study of dysfunction within
cognitive and decision-making systems. Whereas
traditional psychiatry classifies dysfunctions into
categories based on external similarities, new pro-
posals have suggested that classification would be
better served by addressing the underlying dys-
function. The emerging field of computational
psychiatry suggests that computational models
of underlying neural mechanisms can provide a
more reasoned basis for the nature of dysfunc-
tion and the modality of treatment (Redish et al.,
2008; Maia and Frank, 2011; Montague et al.,
2012).
Impulsivity is a strong candidate for such a
trans-disease mechanism (Bickel et al., 2012;
Robbins et al., 2012). Impulsive choices underlie
several different psychiatric disorders (American
Psychiatric Association, 2000; Heyman, 2009;
Madden and Bickel, 2010), and there appear to
be similar neural bases for impulsivity across
these disorders (Dalley et al., 2008; Robbins
et al., 2012).
Precommitment is a powerful strategy to
combat impulsivity. Although addicts have faster
discounting rates on average than non-addicts
(Bickel and Marsch, 2001), the distributions of
addicts’ and non-addicts’ discounting rates over-
lap substantially. Furthermore, of all major psy-
chiatric disorders, addiction has by far the highest
rate of spontaneous remission (Heyman, 2009),
despite the stability of discounting rates over time
(Kirby, 2009). This suggests that people can over-
come addiction despite continuing to have impul-
sive underlying preferences. Precommitment is
an ideal strategy for an impulsive agent to make
healthy choices. Some people may spontaneously
acquire precommitment strategies, while others
may benefit from being explicitly instructed in
such strategies.
Models of precommitment (Kurth-Nelson and
Redish, 2010) make predictions about what pre-
commitment strategies will be most effective in
treating impulsivity disorders such as addiction.
They predict that more impulsive individuals will
be more sensitive to the delay between the option
to precommit and the availability of the impulsive
choice. They also predict that precommitment
depends on the precise shape of the discounting
curve, such that two individuals with the same
discounting rate can exhibit very different pre-
commitment behavior.
The latter is particularly interesting in light of
the fact that it is possible to change an individ-
ual’s discounting function. For example, Bickel
et al. (2012) found that working memory train-
ing decreases impulsivity. Others have shown
that differences in executive function abilities
predict differences in impulsivity (Burks et al.,
2009; Romer et al., 2011), which suggests that
improving executive function could reduce
impulsivity. On the other hand, imposing cog-
nitive load makes subjects more impulsive (Vohs
and Faber, 2007; Vohs et al., 2008). It is not yet
known whether the improvements in discount-
ing functions from working memory training are
due to strengthening of long-sighted neural sys-
tems, weakening of short-sighted neural systems,
or a change in the interplay between the two. Nor
is it yet known how these manipulations interact
with precommitment as a treatment paradigm
for addiction.
Finally, the TDRL model depends on having
a state-space where precommitment is avail-
able as an option. This opens the very impor-
tant and poorly explored question of how the
brain constructs the state-space. In the context
of the issues examined here, the brain needs to
recognize that precommitment is available. It
may be that factors such as working memory
and other cognitive resources are important for
flexibly constructing adaptive state-spaces, and
this may be an essential part of recovery. Even
verbally instructing an individual that precom-
mitment is available might be enough to help
create the state-space that TDRL or other learn-
ing processes could use for precommitment. The
ability to form representations of the world that
support healthy strategies, even in the face of
high underlying impulsivity, may be one of the
most important factors in recovery from disor-
ders like addiction.
acknowledgments
This work was supported by NIH grant R01
DA024080 (A. David Redish) and by the Max
Planck Institute for Human Development as
part of the Joint Initiative on Computational
Psychiatry and Aging Research between the Max
Planck Society and University College London
(Zeb Kurth-Nelson). The Wellcome Trust Centre
for Neuroimaging is supported by core fund-
ing from the Wellcome Trust 091593/Z/10/Z.
We thank Warren Bickel for providing data for
Figure 3.
Kurth-Nelson and Redish Models of precommitment
Frontiers in Neuroscience www.frontiersin.org October 2012 | Volume 6 | Article 138 | 8
references
Ainslie, G. (1974). Impulse control in
pigeons. J. Exp. Anal. Behav. 21,
485–489.
Ainslie, G. (1992). Picoeconomics: The
Strategic Interaction of Successive
Motivational States Within the Person.
New York, NY: Cambridge University
Press.
Ainslie, G. (2001). Breakdown of Will. New
York, NY: Cambridge University Press.
Ainslie, G., and Monterosso, J. R. (2003).
Building blocks of self-control:
increased tolerance for delay with
bundled rewards. J. Exp. Anal. Behav.
79, 37–48.
Alexander, W. H., and Brown, J. W. (2010).
Hyperbolically discounted temporal
difference learning. Neural. Comput.
22, 1511–1527.
American Psychiatric Association.
(2000). Diagnostic and Statistical
Manual of Mental Disorders, 4th Edn.
Washington, DC: APA.
Ariely, D., and Wertenbroch, K. (2002).
Procrastination, deadlines, and per-
formance: self-control by precommit-
ment. Psychol. Sci. 13, 219–224.
Baumeister, R. F., Heatherton, T. F., and
Tice, D. M. (1994). Losing Control:
How and Why People Fail at Self-
Regulation. San Diego, CA: Academic
Press.
Baumeister, R. F., and Tierney, J. (2011).
Willpower: Rediscovering the Greatest
Human Strength. New York, NY:
Penguin Press.
Bechara, A., Nader, K., and van der Kooy,
D. (1998). A two-separate-motiva-
tional-systems hypothesis of opioid
addiction. Pharmacol. Biochem. Behav.
59, 1–17.
Bellman, R. (1957). Dynamic program-
ming. Princeton: Princeton University
Press.
Bickel, W. K., Jarmolowicz, D. P., Mueller,
E. T., Koffarnus, M. N., and Gatchalian,
K. M. (2012). Excessive discounting of
delayed reinforcers as a trans-disease
process contributing to addiction and
other disease-related vulnerabilities:
emerging evidence. Pharmacol. Ther.
134, 287–297.
Bickel, W. K., and Marsch, L. A. (2001).
Toward a behavioral economic under-
standing of drug dependence: delay
discounting processes. Addiction 96,
73–86.
Burks, S. V., Carpenter, J. P., Goette, L., and
Rustichini, A. (2009). Cognitive skills
affect economic preferences, strategic
behavior, and job attachment. Proc.
Natl. Acad. Sci. U.S.A. 106, 7745–7750.
Chopra, M. P., Landes, R. D., Gatchalian,
K. M., Jackson, L. C., Bickel, W.
K., Buchhalter, A. R., et al. (2009).
Buprenorphine medication versus
voucher contingencies in promot-
ing abstinence from opioids and
cocaine. Exp. Clin. Psychopharmacol.
17, 226–236.
Dalley, J. W., Mar, A. C., Economidou,
D., and Robbins, T. W. (2008).
Neurobehavioral mechanisms of
impulsivity: fronto-striatal systems
and functional neurochemistry.
Pharmacol. Biochem. Behav. 90,
250–260.
Daw, N. (2000). Behavioral considera-
tions suggest an average reward TD
model of the dopamine system.
Neurocomputing 32, 679–684.
Daw, N. D. (2003). Reinforcement Learning
Models of the Dopamine System and
their Behavioral Implications, School of
Computer Science, Carnegie Mellon
University, Pittsburgh, PA.
Daw, N. D., Gershman, S. J., Seymour,
B., Dayan, P., and Dolan, R. J. (2011).
Model-based influences on humans’
choices and striatal prediction errors.
Neuron 69, 1204–1215.
Daw, N. D., Niv, Y., and Dayan, P. (2005).
Uncertainty-based competition
between prefrontal and dorsolateral
striatal systems for behavioral control.
Nat. Neurosci. 8, 1704–1711.
Dayan, P., Niv, Y., Seymour, B., and Daw,
N. D. (2006). The misbehavior of value
and the discipline of the will. Neural.
Netw. 19, 1153–1160.
Fermin, A., Yoshida, T., Ito, M., Yoshimoto,
J., and Doya, K. (2010). Evidence for
model-based action planning in a
sequential finger movement task. J.
Mot. Behav. 42, 371–379.
Frederick, S., Loewenstein, G., and
O’Donoghue, T. (2002). Time dis-
counting and time preference: a criti-
cal review. J. Econ. Lit. 40, 351–401.
Glascher, J., Daw, N., Dayan, P., and
O’Doherty, J. P. (2010). States versus
rewards: dissociable neural prediction
error signals underlying model-based
and model-free reinforcement learn-
ing. Neuron 66, 585–595.
Glimcher, P. W. (2008). Neuroeconomics:
Decision Making and the Brain. San
Diego, CA: Academic Press.
Haidt, J. (2006). The Happiness Hypothesis:
Finding Modern Truth in Ancient
Wisdom. New York: Basic Books.
Heyman, G. M. (2009). Addiction: A
Disorder of Choice. Cambridge, MA:
Harvard University Press.
Huys, Q. J., Eshel, N., O’Lions, E.,
Sheridan, L., Dayan, P., and Roiser, J.
P. (2012). Bonsai trees in your head:
how the Pavlovian system sculpts goal-
directed choices by pruning decision
trees. PLoS Comput. Biol. 8, e1002410.
doi: 10.1371/journal.pcbi.1002410
Jimura, K., Myerson, J., Hilgard, J.,
Keighley, J., Braver, T. S., and Green,
L. (2011). Domain independence and
stability in young and older adults’ dis-
counting of delayed rewards. Behav.
Processes 87, 253–259.
Kable, J. W., and Glimcher, P. W. (2009).
The neurobiology of decision: con-
sensus and controversy. Neuron 63,
733–745.
Kirby, K. N. (2009). One-year tempo-
ral stability of delay-discount rates.
Psychon. Bull. Rev. 16, 457–462.
Kurth-Nelson, Z., Bickel, W., and Redish,
A. D. (2012). A theoretical account of
cognitive effects in delay discounting.
Eur. J. Neurosci. 35, 1052–1064.
Kurth-Nelson, Z., and Redish, A. D.
(2009). Temporal-difference rein-
forcement learning with distrib-
uted representations. PLoS ONE.
4, e7362. doi: 10.1371/journal.
pone.0007362
Kurth-Nelson, Z., and Redish, A. D.
(2010). A reinforcement learning
model of precommitment in decision
making. Front. Behav. Neurosci. 4:184.
doi: 10.3389/fnbeh.2010.00184
Kurth-Nelson, Z., and Redish, A. D.
(2012). “Modeling decision-making
systems in addiction,” in Computational
Neuroscience of Drug Addiction, ed. B.
Gutkin and S. Ahmed (New York, NY:
Springer), 163–188.
Kurzban, R. (2010). Why Everyone
(else) is A Hypocrite: Evolution and
The Modular Mind. Princeton, NJ:
Princeton University Press.
Ladouceur, R., Blaszczynski, A.,
and Lalande, D. R. (2012). Pre-
commitment in gambling: a review
of the empirical evidence. Int. Gambl.
Stud. 1–16.
Madden, G. J., and Bickel, W. K. (2010).
Impulsivity: The Behavioral and
Neurological Science of Discounting.
Washington, DC: American
Psychological Association.
Maia, T. V., and Frank, M. J. (2011). From
reinforcement learning models to psy-
chiatric and neurological disorders.
Nat. Neurosci. 14, 154–162.
Mazur, J. E. (1997). Choice, delay, proba-
bility, and conditioned reinforcement.
Anim. Learn. Behav. 25, 131.
McClure, S. M., Laibson, D. I., Loewenstein,
G., and Cohen, J. D. (2004). Separate
neural systems value immediate and
delayed monetary rewards. Science
306, 503–507.
Montague, P. R., Dayan, P., and Sejnowski,
T. J. (1996). A framework for mesen-
cephalic dopamine systems based
on predictive Hebbian learning. J.
Neurosci. 16, 1936–1947.
Montague, P. R., Dolan, R. J., Friston, K. J.,
and Dayan, P. (2012). Computational
psychiatry. Trends Cogn. Sci. (Regul.
Ed.) 16, 72–80.
Niv, Y., Joel, D., and Dayan, P. (2006). A
normative perspective on motiva-
tion. Trends Cogn. Sci. (Regul. Ed.) 10,
375–381.
O’Doherty, J. P. (2012). Beyond simple rein-
forcement learning: the computational
neurobiology of reward-learning and
valuation. Eur. J. Neurosci. 35, 987–990.
Ohmura, Y., Takahashi, T., Kitamura, N.,
and Wehr, P. (2006). Three-month
stability of delay and probability
discounting measures. Exp. Clin.
Psychopharmacol. 14, 318–328.
Petry, N. M. (2012). Contingency
Management for Substance Abuse
Treatment: A Guide to Implementing
this Evidence-Based Practice. New York:
Routledge.
Rachlin, H., and Green, L. (1972).
Commitment, choice and self-control.
J. Exp. Anal. Behav. 17, 15–22.
Rangel, A., Camerer, C., and Montague,
P. R. (2008). A framework for study-
ing the neurobiology of value-based
decision making. Nat. Rev. Neurosci.
9, 545–556.
Redish, A. D., Jensen, S., and Johnson,
A. (2008). A unified framework for
addiction: vulnerabilities in the deci-
sion process. Behav. Brain Sci. 31,
415–437.
Rick, S., and Loewenstein, G. (2008).
Intangibility in intertemporal choice.
Phil. Trans. Roy. Soc. B 363, 3813–3824.
Robbins, T. W., Gillan, C. M., Smith, D.
G., de Wit, S., and Ersche, K. D. (2012).
Neurocognitive endophenotypes of
impulsivity and compulsivity: towards
dimensional psychiatry. Trends Cogn.
Sci. (Regul. Ed.) 16, 81–91.
Roesch, M. R., Esber, G. R., Li, J., Daw,
N. D., and Schoenbaum, G. (2012).
Surprise! neural correlates of pearce-
hall and rescorla-wagner coexist
within the brain. Eur. J. Neurosci. 35,
1190–1200.
Romer, D., Betancourt, L. M., Brodsky,
N. L., Giannetta, J. M., Yang, W., and
Hurt, H. (2011). Does adolescent
risk taking imply weak executive
function? A prospective study of
relations between working memory
performance, impulsivity, and risk
taking in early adolescence. Dev. Sci.
14, 1119–1133.
Schultz, W. (1998). Predictive reward
signal of dopamine neurons. J.
Neurophysiol. 80, 1–27.
Schultz, W., Dayan, P., and Montague,
P. R. (1997). A neural substrate of
prediction and reward. Science 275,
1593–1599.
Schweighofer, N., Bertin, M., Shishida,
K., Okamoto, Y., Tanaka, S. C., and
Yamawaki, S. Doya, K. (2008). Low-
serotonin levels increase delayed
reward discounting in humans. J.
Neurosci. 28, 4528–4532.
Simon, D. A., and Daw, N. D. (2011).
Neural correlates of forward planning
in a spatial decision task in humans. J.
Neurosci. 31, 5526–5539.
Strotz, R. H. (1955). Myopia and incon-
sistency in dynamic utility maximiza-
tion. Rev. Econ. Stud. 23, 165–180.
Kurth-Nelson and Redish Models of precommitment
Frontiers in Neuroscience www.frontiersin.org October 2012 | Volume 6 | Article 138 | 9
Sutton, R. S., and Barto, A. G. (1998).
Reinforcement Learning: An
Introduction. Cambridge, MA: MIT
Press.
Takahashi, T., Furukawa, A., Miyakawa, T.,
Maesato, H., and Higuchi, S. (2007).
Two-month stability of hyperbolic
discount rates for delayed monetary
gains in abstinent inpatient alcoholics.
Neuro Endocrinol. Lett. 28, 131–136.
Tanaka, S. C., Schweighofer, N., Asahi,
S., Shishida, K., Okamoto, Y., and
Yamawaki, S. Doya, K. (2007).
Serotonin differentially regulates
short- and long-term prediction of
rewards in the ventral and dorsal stria-
tum. PLoS ONE 2, e1333. doi: 10.1371/
journal.pone.0001333
van der Meer, M. A. A., Kurth-Nelson,
Z., and Redish, A. D. (2012).
Information processing in decision-
making systems. Neuroscientist 18,
342–359.
Vohs, K. D., and Faber, R. J. (2007).
Spent resources: self-regulatory
resource availability affects
impulse buying. J. Consum. Res.
33, 537–547.
Vohs, K. D., Nelson, N. M., Baumeister, R.
F., Tice, D. M., Schmeichel, B. J., and
Twenge, J. M. (2008). Making choices
impairs subsequent self-control: a
limited-resource account of decision
making, self-regulation, and active
initiative. J. Pers. Soc. Psychol. 94,
883–898.
Wohl, M. J. A., Lyon, M., Donnelly, C.
L., Young, M. M., Matheson, K., and
Anisman, H. (2008). Episodic ces-
sation of gambling: a numerically
aided phenomenological assess-
ment of why gamblers stop playing
in a given session. Int. Gambl. Stud.
8, 249–263.
Wunderlich, K., Dayan, P., and Dolan, R.
J. (2012). Mapping value based plan-
ning and extensively trained choice in
the human brain. Nat. Neurosci. 15,
786–791.
Conflict of Interest Statement: The
authors declare that the research was
conducted in the absence of any com-
mercial or financial relationships that
could be construed as a potential conflict
of interest.
Received: 25 June 2012; accepted: 04
September 2012; published online: 08
October 2012.
Citation: Kurth-Nelson Z and Redish AD
(2012) Don’t let me do that! – models of
precommitment. Front. Neurosci. 6:138.
doi: 10.3389/fnins.2012.00138
Copyright © 2012 Kurth-Nelson and
Redish. This is an open-access article dis-
tributed under the terms of the Creative
Commons Attribution License, which per-
mits use, distribution and reproduction in
other forums, provided the original authors
and source are credited and subject to any
copyright notices concerning any third-party
graphics etc.
... Because delay discounting varies across the lifespan [27], sham group, anodal group and cathodal group were well balanced with respect to age (One-way ANOVA, F (2,267) = 0.256, p = 0.775, η 2 p = 0.002), which eliminated effects of age on intertemporal preference. In addition, because the demand for precommitment is not independent on individual differences in delay discounting [13,[28][29][30], we performed One-way ANOVA on delay discounting as a factor to test the main effects of stimulation, delay discounting was measured using the delay discounting questionnaire of Kirby et al. (1996). The results suggested the stimulation over the FPC did not alter participants' delay discounting. ...
... It is also important to note that the current data revealed no effects of tDCS over the FPC on impulse control. On the one hand, the willingness to precommit was not independent of individual differences in delay discounting [13,[28][29][30]37], therefore, the well-balanced distribution of delay discounting in the sham group, anodal group and cathodal group excluded potential confounds. On the other hand, individual delay discounting levels are related to lateral prefrontal cortex (DLPFC) activity [32,[38][39][40], and the consistency in individual delay discounting in the sham group, anodal group and cathodal group indicated that tDCS exerted its effects by modulating neural activity in the FPC rather than in other brain regions that were not activated by the current task. ...
... Self-report is a potentially cost effective and efficient way for the measurement of the expected value of precommitment. Based on the fact that evaluating the expected value of precommitment is a kind of metacognitive functioning [29,30], the questionnaires could be designed by the lights of the literature on the measurement of metacognitive skills [50][51][52], which could be further explored in future research. ...
Article
Caving into temptation leads to deviation from the planned path, which reduces our performance, adds trouble to our daily life, and can even bring about psychiatric disorders. Precommitment is an effective way to remedy the failure of willpower by removing the tempting short-term option. This paper aims to test the neural mechanisms of precommitment through a monetary task that excluded the interference of heterogeneous individual preferences and complements present researches. We examined whether transcranial direct current stimulation (tDCS) over the frontopolar cortex (FPC) could affect the demand for precommitment. The participants were required to make a decision regarding whether they were willing to precommit to binding later-lar ger rewards and remove the sooner-smaller rewards. Three conditions, including no precommitment, loose precommitment and strict precommitment, were established to perform a comprehensive investigation. We found that tDCS over the FPC altered the demand for precommitment in the condition involving loose precommitment with the control of delay discounting, specifically, anodal stimulation led to more precommitment, whereas cathodal stimulation reduced the demand for precommitment. Our findings established a causal correlation between the FPC and willingness to precommit and suggested a feasible method to enhance self-control in addition to exercising willpower.
... For example, only if smokers anticipate that they will give in to smoking a cigarette when going to a party can they avoid such situations where their capacity to resist immediate temptations would not be sufficient. Conceptually, prospective decisions where decision makers voluntarily restrict their access to temptations presuppose metacognitive awareness of one's preferences for delayed versus immediate rewards 7,21,23 . Recent findings suggest that better metacognitive accuracy indeed predicts a higher likelihood of restricting one's access to immediate rewards when anticipating potential preference reversals 24,25 . ...
... From this perspective, metacognitive awareness of one's preferences for delayed versus immediate rewards represents the precondition for the ability to accurately anticipate potential preference reversals from delayed to immediate rewards. As the anticipation of potential preference reversals belongs to the driving forces of precommitment decisions 7,23 , deficits in metacognition may thus reduce the sensitivity to preference reversals during precommitment decisions. Metacognitive deficits in nicotine addiction might also challenge the theory of rational addiction according to which substance abuse can be understood as rational and farsighted maximization of an individual's utility 26,27 . ...
Article
Full-text available
Deficits in impulse control belong to the core profile of nicotine dependence. Smokers might thus benefit from voluntarily self-restricting their access to the immediate temptation of nicotine products (precommitment) in order to avoid impulse control failures. However, little is known about how smokers’ willingness to engage in voluntary self-restrictions is determined by metacognitive insight into their general preferences for immediate over delayed rewards. Here, with a series of monetary intertemporal choice tasks, we provide empirical evidence for reduced metacognitive accuracy in smokers relative to non-smokers and show that smokers overestimate the subjective value of delayed rewards relative to their revealed preferences. In line with the metacognitive deficits, smokers were also less sensitive to the risk of preference reversals when deciding whether or not to restrict their access to short-term financial rewards. Taken together, the current findings suggest that deficits not only in impulse control but also in metacognition may hamper smokers’ resistance to immediate rewards and capacity to pursue long-term goals.
... In more natural settings, humans tend to learn from experience that our preferences reverse over time and that we often stray from our long-term plans. As a result, we sometimes develop strategies to precommit to a specific course of action, making an impulsive drift more difficult or costly 82 . In doing that, we may take into account future changes in our own motivational state 83,84 . ...
Article
Full-text available
When presented with the option of either an immediate benefit or a larger, later reward, we may behave impatiently by choosing instant gratification. Nonetheless, when we can make the same decision ahead of time and plan for the future, we tend to make more patient choices. Here, we explored whether great apes share this core feature of human decision-making, often referred to as dynamic inconsistency. We found that orangutans, bonobos, and gorillas tended to act impatiently and with considerable variability between individuals when choosing between an immediate reward and a larger-later reward, which is a commonly employed testing method in the field. However, with the inclusion of a front-end delay for both alternatives, their decisions became more patient and homogeneous. These results show that great apes are dynamically inconsistent. They also suggest that, when choosing between future outcomes, they are more patient than previously reported. We advocate for the inclusion of diverse time ranges in comparative research, especially considering the intertwinement of intertemporal choices and future-oriented behavior.
... Self-report measures assess a generalized subjective judgment about how frequently one behaves in a self-controlled manner (i.e., it measures the outcome of self-control processes), but self-reports may not differentiate between different mechanisms underlying selfcontrolled behavior. Whereas interventive self-control strategies like craving regulation or the generation of anticipatory emotions primarily play a role when one faces a temptation and cannot avoid a self-control conflict, there is evidence that self-control in real-life contexts often depends on the formation of beneficial habits (Galla and Duckworth, 2015;Gillebaart and de Ridder, 2015;De Ridder and Gillebaart, 2017;Gillebaart and Adriaanse, 2017) or preventive precommitment strategies that serve to avoid temptations (Kurth-Nelson and Redish, 2012;Soutschek et al., 2017;Studer et al., 2019). This may explain why self-reported trait self-control and interventive strategies like anticipatory emotions are often not strongly correlated. ...
Article
Full-text available
Self-control is typically attributed to “cold” cognitive control mechanisms that top-down influence “hot” affective impulses or emotions. In this study we tested an alternative view, assuming that self-control also rests on the ability to anticipate emotions directed toward future consequences. Using a behavioral within-subject design including an emotion regulation task measuring the ability to voluntarily engage anticipated emotions towards an upcoming event and a self-control task in which subjects were confronted with a variety of everyday conflict situations, we examined the relationship between self-control and anticipated emotions. We found that those individuals (n = 33 healthy individuals from the general population) who were better able to engage anticipated emotions to an upcoming event showed stronger levels of self-control in situations where it was necessary to resist short-term temptations or to endure short-term aversions to achieve long-term goals. This finding suggests that anticipated emotions may play a functional role in self-control-relevant deliberations with respect to possible future consequences and are not only inhibited top-down as implied by “dual system” views on self-control.
... A paradigmatic example of proactive preventive self-control are precommitment strategies, which serve to reduce the likelihood that one will yield to an anticipated future temptation (Kuhl & Goschke, 1994;Kurth-Nelson & Redish, 2012;Soutschek et al., 2017;Studer et al., 2019). In the most extreme case, one can try to prevent self-control conflicts altogether by avoiding situations involving an anticipated temptation, or, if this is not possible, to restrict the space of one's future behavioral options. ...
Article
Full-text available
Self-control denotes the ability to override current desires to render behavior consistent with long-term goals. A key assumption is that self-control is required when short-term desires are transiently stronger (more preferred) than long-term goals and people would yield to temptation without exerting self-control. We argue that this widely shared conception of self-control raises a fundamental yet rarely discussed conceptual paradox: How is it possible that a person most strongly desires to perform a behavior (e.g., eat chocolate) and at the same time desires to recruit self-control to prevent themselves from doing it? A detailed analysis reveals that three common assumptions about self-control cannot be true simultaneously. To avoid the paradox, any coherent theory of self-control must abandon either the assumption (a) that recruitment of self-control is an intentional process, or (b) that humans are unitary agents, or (c) that self-control consists in overriding the currently strongest desire. We propose a taxonomy of different kinds of self-control processes that helps organize current theories according to which of these assumptions they abandon. We conclude by outlining unresolved questions and future research perspectives raised by different conceptions of self-control and discuss implications for the question of whether self-control can be considered rational.
... The MDP framework, which allows rewards of various magnitudes to be realized at different points in time, is well suited for modeling intertemporal choice. Kurth-Nelson and Redish [25,26] explore a model of precommitment in decision making as a means of preventing impulsive defections. Their model addresses the initial decision to commit rather than the ongoing possibility of defection. ...
Preprint
Full-text available
Individuals are often faced with temptations that can lead them astray from long-term goals. We're interested in developing interventions that steer individuals toward making good initial decisions and then maintaining those decisions over time. In the realm of financial decision making, a particularly successful approach is the prize-linked savings account: individuals are incentivized to make deposits by tying deposits to a periodic lottery that awards bonuses to the savers. Although these lotteries have been very effective in motivating savers across the globe, they are a one-size-fits-all solution. We investigate whether customized bonuses can be more effective. We formalize a delayed-gratification task as a Markov decision problem and characterize individuals as rational agents subject to temporal discounting, a cost associated with effort, and fluctuations in willpower. Our theory is able to explain key behavioral findings in intertemporal choice. We created an online delayed-gratification game in which the player scores points by selecting a queue to wait in and then performing a series of actions to advance to the front. Data collected from the game is fit to the model, and the instantiated model is then used to optimize predicted player performance over a space of incentives. We demonstrate that customized incentive structures can improve an individual's goal-directed decision making.
Article
Full-text available
Self-control describes the processes by which individuals control their habits, desires, and impulses in the service of long-term goals. Research has identified important components of self-control and proposed theoretical frameworks integrating these components (e.g., Inzlicht et al., 2021; Kotabe & Hofmann, 2015). In our perspective, these frameworks, however, do not yet fully incorporate important metacognitive aspects of self-control. We therefore introduce a framework explicating the role of metacognition for self-control. This framework extends existing frameworks, primarily from the domains of self-regulated learning and problem-solving (e.g., Schraw & Moshman, 1995; Zimmerman, 2000), and integrates past and contemporary research and theorizing on self-control that involves aspects of metacognition. It considers two groups of metacognitive components, namely, (a) individual metacognitive characteristics, that is a person’s declarative, procedural, and conditional metacognitive knowledge about self-control, as well as their self-awareness (or metacognitive awareness), and (b) metacognitive regulatory processes that unfold before a self-control conflict (forethought and prevention), when a self-control conflict is identified, during a self-control conflict (regulation and monitoring), and after a self-control conflict (reflection and evaluation). The proposed framework integrates existing research and will be useful for highlighting new directions for research on the role of metacognition for self-control success and failure.
Chapter
Full-text available
Addiction appears to contradict expected utility theory and has therefore been the subject of many re-examinations of motivation. Addiction is variously said to arise from and/or be maintained by conditioning, habit learning (as distinct from the goal-directed kind), the elicitation of counterfeit reward in the midbrain, accelerated delay discounting, hyperbolic delay discounting, and unspecified sorts of disease or compulsion that imply addiction is not motivated at all. Each of these models has some roots in observation but each has problems, particularly in accounting for addictions that do not need a neurophysiologically active agent, such as to gambling or video games. I propose that an implication of hyperbolic delay discounting-recursive self-prediction-adds necessary mechanisms for addiction within a motivational framework. An addict's "force of habit" may be motivated by what amounts to accumulated consumption capital within an endogenous reward process. In a recursive motivational model the addict's impaired responsibility is more like bankruptcy than disease.
Article
We review and synthesize recent neurophysiological studies of decision making in humans and nonhuman primates. From these studies, the basic outline of the neurobiological mechanism for primate choice is beginning to emerge. The identified mechanism is now known to include a multicomponent valuation stage, implemented in ventromedial prefrontal cortex and associated parts of striatum, and a choice stage, implemented in lateral prefrontal and parietal areas. Neurobiological studies of decision making are beginning to enhance our understanding of economic and social behavior as well as our understanding of significant health disorders where people's behavior plays a key role.
Book
In the years since it first published, Neuroeconomics: Decision Making and the Brain has become the standard reference and textbook in the burgeoning field of neuroeconomics. The second edition, a nearly complete revision of this landmark book, will set a new standard. This new edition features five sections designed to serve as both classroom-friendly introductions to each of the major subareas in neuroeconomics, and as advanced synopses of all that has been accomplished in the last two decades in this rapidly expanding academic discipline. The first of these sections provides useful introductions to the disciplines of microeconomics, the psychology of judgment and decision, computational neuroscience, and anthropology for scholars and students seeking interdisciplinary breadth. The second section provides an overview of how human and animal preferences are represented in the mammalian nervous systems. Chapters on risk, time preferences, social preferences, emotion, pharmacology, and common neural currencies-each written by leading experts-lay out the foundations of neuroeconomic thought. The third section contains both overview and in-depth chapters on the fundamentals of reinforcement learning, value learning, and value representation. The fourth section, "The Neural Mechanisms for Choice,? integrates what is known about the decision-making architecture into state-of-the-art models of how we make choices. The final section embeds these mechanisms in a larger social context, showing how these mechanisms function during social decision-making in both humans and animals. The book provides a historically rich exposition in each of its chapters and emphasizes both the accomplishments and the controversies in the field. A clear explanatory style and a single expository voice characterize all chapters, making core issues in economics, psychology, and neuroscience accessible to scholars from all disciplines. The volume is essential reading for anyone interested in neuroeconomics in particular or decision making in general.
Article
Neuroeconomics is the study of the neurobiological and computational basis of value-based decision making. Its goal is to provide a biologically based account of human behaviour that can be applied in both the natural and the social sciences. This Review proposes a framework to investigate different aspects of the neurobiology of decision making. The framework allows us to bring together recent findings in the field, highlight some of the most important outstanding problems, define a common lexicon that bridges the different disciplines that inform neuroeconomics, and point the way to future applications.
Article
The hyperbolic-decay model is a mathematical expression of the relation between delay and reinforcer value. The model has been used to predict choices in discrete-trial experiments on delay-amount tradeoffs, on preference for variable over fixed delays, and on probabilistic reinforcement. Experiments manipulating the presence or absence of conditioned reinforcers on trials that end without primary reinforcement have provided evidence that the hyperbolic-decay model actually predicts the strength of conditioned reinforcers rather than the strength of delayed primary reinforcers. The model states that the strength of a conditioned reinforcer is inversely related to the time spent in its presence before a primary reinforcer is delivered. A possible way to integrate the model with Grace's (1994) contextual-choice model for concurrent-chain schedules is presented. Also discussed are unresolved difficulties in determining exactly when a stimulus will or will not serve as a conditioned reinforcer.
Article
Making choices, responding actively instead of passively, restraining impulses, and other acts of self-control and volition all draw on a common resource that is limited and renewable, akin to strength or energy. After an act of choice or self-control, the self's resources have been expended, producing the condition of ego depletion. In this state, the self is less able to function effectively, such as by regulating itself or exerting volition. Effects of ego depletion appear to reflect an effort to conserve remain ing resources rather than full exhaustion, although in principle full exhaustion is possible. This versatile but limited resource is crucial to the self's optimal functioning, and the pervasive need to conserve it may result in the commonly heavy reliance on habit, routine, and automatic processes.