ArticlePDF Available

Punishment resistance for cocaine is associated with inflexible habits in rats

Authors:

Abstract and Figures

Addiction is characterized by continued drug use despite negative consequences. In an animal model, a subset of rats continues to self-administer cocaine despite footshock consequences, showing punishment resistance. We sought to test the hypothesis that punishment resistance arises from failure to exert goal-directed control over habitual cocaine seeking. While habits are not inherently permanent or maladaptive, continued use of habits under conditions that should encourage goal-directed control makes them maladaptive and inflexible. We trained male and female Sprague Dawley rats on a seeking-taking chained schedule of cocaine self-administration. We then exposed them to four days of punishment testing in which footshock was delivered randomly on one-third of trials. Before and after punishment testing (four days pre-punishment and ≥ four days post-punishment), we assessed whether cocaine seeking was goal-directed or habitual using outcome devaluation via cocaine satiety. We found that punishment resistance was associated with continued use of habits, whereas punishment sensitivity was associated with increased goal-directed control. Although punishment resistance for cocaine was not predicted by habitual responding pre-punishment, it was associated with habitual responding post-punishment. In parallel studies of food self-administration, we similarly observed that punishment resistance was associated with habitual responding post-punishment but not pre-punishment in males, although it was associated with habitual responding both pre- and post-punishment in females, indicating that punishment resistance was predicted by habitual responding in food-seeking females. These findings indicate that punishment resistance is related to habits that have become inflexible and persist under conditions that should encourage a transition to goal-directed behavior.
Content may be subject to copyright.
Punishment resistance for cocaine is associated with inflexible
habits in rats
Bradley O. Jonesa, Morgan S. Paladinob, Adelis M. Cruzb, Haley F. Spencerb, Payton L.
Kahanekb, Lauren N. Scarboroughb, Sandra F. Georgesb, Rachel J. Smitha,b,*
aInstitute for Neuroscience, Texas A&M University, College Station, TX, USA
bDepartment of Psychological and Brain Sciences, Texas A&M University, College Station, TX,
USA
Abstract
Addiction is characterized by continued drug use despite negative consequences. In an animal
model, a subset of rats continues to self-administer cocaine despite footshock consequences,
showing punishment resistance. We sought to test the hypothesis that punishment resistance
arises from failure to exert goal-directed control over habitual cocaine seeking. While habits are
not inherently permanent or maladaptive, continued use of habits under conditions that should
encourage goal-directed control makes them maladaptive and inflexible. We trained male and
female Sprague Dawley rats on a seeking-taking chained schedule of cocaine self-administration.
We then exposed them to four days of punishment testing in which footshock was delivered
randomly on one-third of trials. Before and after punishment testing (four days pre-punishment
and ≥ four days post-punishment), we assessed whether cocaine seeking was goal-directed or
habitual using outcome devaluation via cocaine satiety. We found that punishment resistance
was associated with continued use of habits, whereas punishment sensitivity was associated with
increased goal-directed control. Although punishment resistance for cocaine was not predicted by
habitual responding pre-punishment, it was associated with habitual responding post-punishment.
In parallel studies of food self-administration, we similarly observed that punishment resistance
was associated with habitual responding post-punishment but not pre-punishment in males,
although it was associated with habitual responding both pre- and post-punishment in females,
indicating that punishment resistance was predicted by habitual responding in food-seeking
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
*Corresponding author. rachelsmith@tamu.edu (R.J. Smith).
CRediT authorship contribution statement
Bradley O. Jones: Conceptualization, Formal analysis, Investigation, Writing – original draft, Writing – review & editing. Morgan S.
Paladino: Investigation, Writing – review & editing. Adelis M. Cruz: Investigation, Writing – review & editing. Haley F. Spencer:
Investigation. Payton L. Kahanek: Investigation. Lauren N. Scarborough: Investigation. Sandra F. Georges: Investigation. Rachel
J. Smith: Conceptualization, Formal analysis, Funding acquisition, Investigation, Supervision, Writing – original draft, Writing –
review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.
Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.addicn.2024.100148.
HHS Public Access
Author manuscript
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Published in final edited form as:
Addict Neurosci
. 2024 June ; 11: . doi:10.1016/j.addicn.2024.100148.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
females. These findings indicate that punishment resistance is related to habits that have become
inflexible and persist under conditions that should encourage a transition to goal-directed behavior.
Keywords
Addiction; Compulsive; Footshock; Devaluation; Self-administration; Food
1. Introduction
Addiction is characterized by compulsive drug seeking and continued drug use despite
negative consequences. In an animal model of compulsive drug use, a subset of rats
continues to self-administer cocaine despite footshock consequences, indicating punishment
resistance [1–4]. Compulsive drug use has been theorized to stem from a loss of control over
habitual behavior, making habits maladaptive and inflexible [5–11]. Although habits are
considered automatic and insensitive to changes in outcome value, they are not necessarily
permanent or insensitive to consequences. Rather, habitual behavior is typically flexible in
that it is overridden by goal-directed control under conditions of punishment or changes in
context [9,12]. In contrast, habitual responding that persists despite conditions that should
encourage goal-directed control may indicate that habits have become maladaptive and
inflexible. Here we sought to directly assess the relationship between habitual cocaine
seeking and punishment resistance in rats.
The role of cocaine-seeking habits in the development of punishment resistance has
been unclear, partially due to limited methods for assessing habitual responding for
intravenous (IV) cocaine. We recently developed a procedure to discriminate goal-directed
and habitual responding in rats self-administering IV cocaine using outcome devaluation
via satiety [13]. Goal-directed behavior is performed in direct pursuit of the outcome,
and therefore sensitive to outcome devaluation, whereas habitual behavior is automatically
elicited by conditioned stimuli and insensitive to outcome devaluation [14–16]. Using this
novel outcome devaluation procedure for IV cocaine, we found that bilateral lesions of
dorsolateral striatum (DLS) or dorsomedial striatum (DMS) caused goal-directed or habitual
cocaine responding, respectively, similar to previous work with food rewards [13,15,17–
22]. We also found that a random ratio (RR20) schedule of reinforcement biased toward
goal-directed responding, while a random interval (RI60) schedule biased toward habitual
responding, although this biasing effect was weaker for cocaine as compared to food rewards
[13,15,16,18,21–24]. An advantage of this procedure is that it elicits devaluation temporarily
without the need for additional training, easily allowing repeated testing at different time
points (e.g., before and after footshock punishment testing).
While habitual responding develops in the majority of rats after extended training on cocaine
self-administration [25,26], punishment resistance develops in only a subset of rats [2,3,27].
DLS is necessary for habitual responding for cocaine and is progressively recruited over
extended cocaine training [13,25,28–32]. DLS may also play a role in punishment resistance
for cocaine, considering that DLS inactivation increased sensitivity to footshock punishment
[33]. Similar parallels between habits and punishment resistance have been observed for
Jones et al. Page 2
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
alcohol. Extended alcohol exposure increased habitual responding and DLS control of self-
administration, as well as punishment resistance despite footshock [34–39]. Animals whose
alcohol seeking had become habitual and DLS-dependent after extended training showed
continued alcohol seeking despite footshock, supporting a role for habits in punishment
resistance [36,37]. In contrast, while extended training with food rewards leads to increased
use of habits, it has not been shown to increase punishment resistance [3,4,39–43]. In
summary, extended training with cocaine or alcohol results in habitual responding in the
majority of animals, as well as punishment resistance in a subset of animals, and these two
processes may be linked in addiction.
To investigate the relationship between habitual cocaine seeking and punishment resistance,
we trained male and female rats to self-administer IV cocaine on a seeking-taking chained
schedule of reinforcement, originally developed by Olmstead et al. [44,45] and used
extensively to study punishment resistance [3,4,27,36,37,42,43,46–51]. Thus, we use the
term “seeking” to refer to responding that is not immediately followed by reward (e.g., when
responding on the initial link in a chained schedule, on a partial-reinforcement schedule,
or under extinction conditions); “seeking” describes the behavior and not the underlying
cognitive process, as is the convention in behavioral psychology [52]. We then exposed
rats to four days of punishment testing and used outcome devaluation via cocaine satiety
to assess whether seeking was goal-directed or habitual four days pre-punishment and at
least four days post-punishment. We found that punishment resistance for cocaine was
associated with habitual responding post-punishment but not pre-punishment in both males
and females. In parallel experiments in which rats were trained to self-administer food
instead, we also found that punishment resistance was associated with habitual responding
post-punishment but not pre-punishment in males, although it was associated with habitual
responding both pre- and post-punishment in females. Overall, these data indicate that
punishment resistance is associated with inflexible habits, whereas punishment sensitivity is
associated with increased goal-directed control.
2. Material and methods
2.1. Animals
Male and female Sprague Dawley rats (initial weight 225–250 g; Charles River, Raleigh,
NC, USA) were single-housed in a temperature-and humidity-controlled facility accredited
by AAALAC at Texas A&M University. Rats were housed under a reversed 12-h light/dark
cycle (lights off at 6 a.m.), with food and water access ad libitum, except when noted below.
All experiments were approved by the IACUC at Texas A&M and conducted according
to specifications of the NIH as outlined in the Guide for the Care and Use of Laboratory
Animals.
2.2. Surgery
For cocaine self-administration studies, rats were anesthetized via isoflurane (induction 5
%, maintenance 1–3 %), given a nonsteroidal anti-inflammatory analgesic (ketoprofen, 2
mg/kg, s.c.), and implanted with chronic indwelling IV jugular catheters, as previously
described [53]. Beginning three days after surgery, catheters were flushed once daily with
Jones et al. Page 3
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
0.1 ml of cefazolin (100 mg/ml) and 0.1 ml heparin (500 U/ml). Self-administration sessions
began after at least five days of recovery from surgery.
2.3. Cocaine self-administration
Rats were trained to self-administer IV cocaine (0.5 mg/kg per infusion) on a seeking-taking
chained schedule of reinforcement, in which completion of a random ratio (RR20) or
random interval (RI60) schedule on the seeking lever gave access to the taking lever during
daily 2-h sessions. RR20 and RI60 schedules were used due to their influence on the
development of goal-directed and habitual responding, respectively [13,15,16,18,21–24].
Infusions of cocaine (pump speed of 70 μg/sec) were paired with 5-sec tone and light
cues (78 dB, 2900 Hz; white stimulus light above the active lever). Operant conditioning
chambers were housed in sound-attenuating cubicles and controlled via MED-PC IV (Med-
Associates, St. Albans, VT). Cocaine HCl was obtained as a gift through the NIDA Drug
Supply Program and diluted in sterile 0.9 % saline.
To train animals, self-administration began with fixed ratio (FR) 1 reinforcement, with
only the taking lever available (criterion of 5 sessions ≥20 infusions). Rats were food-
restricted (85–90 % of free-feeding weight) at the start of the experiment to increase general
motivation and were placed back onto free feeding once they had at least two consecutive
sessions where they earned ≥20 infusions. Training then progressed to a chained seeking-
taking schedule with FR1 (seeking) - FR1 (taking) reinforcement (criterion of 2 days ≥15
infusions), during which completion of the seeking link of the chain led to retraction of the
seeking lever and extension of the taking lever; completion of the taking link of the chain
delivered cocaine and led to retraction of the taking lever and the start of the next trial.
During the seeking link of the chain, a stimulus light (
S
+) was presented above the seeking
lever and signaled availability. At the next stage, rats were given a 4-min time out between
trials, such that completion of the taking link of the chain led to retraction of the taking lever
and extension of the seeking lever, but with no
S
+ and no programmed consequence for
responding (criterion of 2 days ≥15 infusions). Training then progressed to RR or RI seeking
schedules, and the taking lever was available for only 60 sec or until an infusion was earned
(FR1), whichever occurred first. Each animal was trained on only one schedule (either RR
or RI). For the RI schedule, the first press on the seeking lever initiated the start of the
random interval, and then the first press made following the random interval completed the
schedule. Training for the seeking lever began at RR3 or RI10 (criterion of 2 days ≥15
infusions), progressed to RR10 or RI30 (criterion of 2 days ≥15 infusions), and then to the
final schedule of RR20 or RI60 (criterion of 5 days ≥15 infusions). The MED-PC program
determined the random ratio or interval for a given trial via a probability function (i. e., 0.05
probability per lever press for RR20; 0.0166 probability per second for RI60). Animals were
removed from studies if they did not meet the minimum criteria after two weeks at a given
stage of training.
2.4. Cocaine outcome devaluation
Once animals were trained on the final seeking-taking schedule, outcome devaluation was
tested across consecutive days in a within-subject manner (devaluation and nondevaluation
days, counterbalanced order). As described previously [13], on the day of outcome
Jones et al. Page 4
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
devaluation, rats were placed into the operant conditioning chambers and after a 5-min
habituation period, were given experimenter-administered IV cocaine, consisting of 10
μl (to fill the catheter volume) plus a dose that mimicked the estimated brain cocaine
concentrations during self-administration, based on the average infusions of the 4 previous
self-administration sessions. Rats were given either 1.0 mg/kg (average of 11–17 infusions
during self-administration), 1.5 mg/kg (18–24 infusions), or 2.0 mg/kg (25–31 infusions), in
increments of 0.5 mg/kg infusions separated by 20 sec. After a 60-sec waiting period, the
seeking lever was available with
S
+ for 10 min under extinction conditions. On the day of
nondevaluation, no infusions were administered but animals spent a similar amount of time
in the chamber prior to starting the 10-min extinction test. Devaluation and nondevaluation
responding was normalized per rat, such that the number of lever presses on one session
was divided by the total lever presses on both sessions (e.g., devaluation lever presses /
(devaluation lever presses + nondevaluation lever presses)). Each 10-min devaluation or
nondevaluation test was followed by a 5-min period with no levers extended and then the
start of a typical cocaine self-administration session. For acclimation purposes, at least two
days prior to the first devaluation test, rats were given a 10-min extinction session similar
to the nondevaluation day. If animals failed to meet a criterion of ≥10 presses during the
nondevaluation test session, then both the devaluation and nondevaluation sessions were
repeated; if they failed again, then they were removed from analyses.
2.5. Food self-administration
Separate groups of rats were trained to self-administer food pellets (45-mg plain purified
pellets, Bio-Serv, Flemington, NJ). Rats were mildly food-restricted for the entire
experiment but still gained weight. Rats were fed in the home cage each day >1 h after
the operant conditioning session ended and were fed the maximum amount possible that also
resulted in all food eaten before the next day’s session (~50 g for males, ~20 g for females).
Rats underwent the same training as described above for cocaine with a seeking-taking
chained schedule of reinforcement (RR20 or RI60), except that a press on the taking lever
resulted in delivery of a food pellet paired with tone and light cues. The first two training
stages differed from cocaine in that the criterion was 3 days =30 pellets for FR1 taking and 2
days ≥20 pellets for FR1 seeking-taking; criterion for subsequent training stages was similar.
Rats experienced a time out between trials, although only 1 min and the seeking lever was
retracted. Sessions were limited to 1 h or 30 rewards, whichever occurred first, so that the
total trials per session were comparable to cocaine studies.
2.6. Food outcome devaluation
Once animals were trained on the final seeking-taking schedule, outcome devaluation
was tested in a within-subject manner via sensory-specific satiety (devaluation and
nondevaluation days, counterbalanced order). Rats were allowed to free-feed on either
45-mg plain purified food pellets (for the devaluation day; the same pellets earned during
self-administration) or 15 % sucrose solution (for the nondevaluation day) in the home
cage for 1 h prior to being placed into the operant conditioning chamber. After a 5-min
waiting period, the seeking lever was available for 10 min under extinction conditions,
and then rats were returned to the home cage. A normal self-administration session took
place the next day, and then the second test (devaluation or nondevaluation, depending
Jones et al. Page 5
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
on counterbalanced order) took place the following day. Devaluation and nondevaluation
responding was normalized per rat, as described above for cocaine outcome devaluation.
Rats were acclimated to 15 % sucrose by giving it in the home cage overnight ≥ three days
prior to the first devaluation test.
2.7. Footshock punishment
Once animals were trained on the final seeking-taking schedule (≥13 days for cocaine,
≥10 for food), and ≥4 days after outcome devaluation, they received four consecutive days
of punishment sessions. Rats received a minimum of 26 total cocaine self-administration
sessions, or 21 total food self-administration sessions, prior to punishment testing. During
the punishment sessions, footshock (0.4 mA, 0.3 sec) was administered on 1/3 trials
randomly, after completion of the seeking link and before extension of the taking lever.
Rewards were still available on footshock trials. After the four days of punishment, rats
returned to daily self-administration and were allowed to recover to baseline responding
levels (≥4 sessions with ≥10 rewards) before outcome devaluation testing.
2.8. Estrous cycle evaluation
Estrous cycle was evaluated via vaginal smears and cytology. Daily swabbing took place
starting several days before punishment and then through punishment testing. After the
self-administration session, a cotton swab wet with filtered deionized water was used for
vaginal swabbing and then smeared onto a glass slide. The phases of the estrous cycles were
determined by viewing dried noncoverslipped slides under a microscope and categorizing as
proestrus, estrus, metestrus, or diestrus according to the proportions of cells, as described by
Ajayi & Akhigbe [54].
2.9. Shock sensitivity threshold testing
Rats were tested for shock sensitivity before all experimentation (with the exception of two
male rats in cocaine group). Rats were placed into operant chambers and given a series
of footshocks (0.3 sec duration, ≥ 10 sec inter-shock interval), in an ascending series of
intensities from 0.1 to 1.0 mA in 0.1-mA steps. The rats were scored for their first flinch,
jump, and vocalization to the footshock, as described by Maren et al. [55]. Following
vocalization, the ascending series was repeated twice more, and scores across the three test
sessions were averaged.
2.10. Data analyses
Animals were removed from all analyses if they failed to meet the criteria for self-
administration (described in Methods). Data were analyzed using
t
-tests or 2-way or 3-way
ANOVAs (with repeated measures when appropriate) as detailed in the Results, with Sidak’s
multiple comparisons tests used for post hoc analyses; post hoc results are shown on
figures. Statistical results are reported for effects with significant
p
values (< 0.05). K-means
clustering analysis was used to identify and separate sensitive and insensitive groups.
Correlation analyses were evaluated via the Pearson correlation coefficient (
r
). Figures show
means ± SEM.
Jones et al. Page 6
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
3. Results
3.1. Cocaine self-administration
Male and female rats were trained on a seeking-taking chained schedule of self-
administration for IV cocaine (2 h per day) and then exposed to four days of punishment
testing (Fig. 1). Rats were trained on either an RR20 or RI60 schedule for the seeking lever
because these schedules have been shown to influence the development of goal-directed
and habitual responding [13,15,16,18,21–24]. For each rat, the four days before punishment
were used as a baseline to assess the effects of punishment. Rats that completed ≥65 %
of baseline trials in the fourth punishment session were considered punishment resistant,
whereas rats that completed <65 % were considered punishment sensitive. We established
this threshold of 65 % based on a larger population of male rats exposed to punishment and
k-means clustering analysis identifying two clusters with a consistent split at 65 % (Fig. S1).
A subset of these male rats is included in the following analyses.
Sensitive and resistant rats were significantly different in terms of percent trials completed
during punishment, for both males (Fig. 2a; 2-way ANOVA: Group
F
1,26 = 26.3,
p
< 0.0001;
Day
F
11,286 = 34.4,
p
<0.0001; Group × Day interaction
F
11,286 = 9.32,
p
< 0.0001) and
females (Fig. 2b; 2-way ANOVA: Group
F
1,23 = 22.2,
p
< 0.0001; Day
F
11,253 = 22.9,
p
<
0.0001; Group × Day interaction
F
11,253 = 6.57,
p
< 0.0001). Sensitive and resistant rats also
differed in terms of total trials completed during punishment, but not before punishment, for
males (Fig. 2c; 2-way ANOVA: Group
p
= 0.09; Day
F
11,286 = 33.1,
p
< 0.0001; Group ×
Day interaction
F
11,286 = 9.63,
p
< 0.0001) and females (Fig. 2d; 2-way ANOVA: Group
F
1,23 = 7.06,
p
= 0.014; Day
F
11,253 = 21.7,
p
< 0.0001; Group × Day interaction
F
11,253
= 5.75,
p
< 0.0001). Males and females were not significantly different from each other in
terms of punishment resistance for cocaine, when comparing percent trials completed on the
fourth day of punishment for all rats (
t
51 = 0.63,
p
= 0.53; male average 57.8% vs. female
53.0 %).
We used outcome devaluation via cocaine satiety to assess whether responding was goal-
directed or habitual four days pre-punishment and at least four days post-punishment (once
rats had recovered from punishment), with each rat given devaluation and nondevaluation
sessions in a counterbalanced order. In male rats (Fig. 2e), both punishment-sensitive
and -resistant rats were insensitive to outcome devaluation pre-punishment, indicating
habitual responding. However, punishment-sensitive rats showed increased sensitivity to
outcome devaluation post-punishment, indicating enhanced goal-directed control, while
punishment-resistant rats remained habitual (2-way ANOVA: Devaluation
F
1,52 = 12.7,
p
< 0.001; Devaluation × Group interaction
F
3,52 = 3.16,
p
= 0.03). In female rats (Fig. 2f),
punishment-sensitive rats were sensitive to outcome devaluation pre- and post-punishment,
while punishment-resistant rats were insensitive to outcome devaluation pre- and post-
punishment, indicating habitual behavior (2-way ANOVA: Devaluation
F
1,46 = 20.2,
p
<
0.0001). Raw data for devaluation testing is shown in Fig. S2.
Further statistical analyses were used to compare males and females, and to evaluate
potential differences for rats trained on RR20 and RI60 schedules of reinforcement. We
first compared males and females within the sensitive and resistant groups. We found a
Jones et al. Page 7
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
significant main effect of Sex in the sensitive group (3-way ANOVA for Sex × Devaluation
× Pre/post: Sex
F
1,29 = 5.04,
p
= 0.033), but no post hoc differences and no significant
effects within the resistant group. We found support for the conclusion that sensitive rats
showed increased goal-directed control in response to punishment because of a Devaluation
× Pre/post interaction (Devaluation
F
1,29 = 29.4,
p
< 0.0001; Devaluation × Pre/post
interaction
F
1,29 = 8.45,
p
= 0.0069), with post hoc analysis revealing significant sensitivity
to devaluation post-punishment for males (
p
< 0.0001) and females (
p
= 0.0048). In contrast,
resistant rats showed no significant main effects or interactions. We then compared rats
trained on RR20 vs. RI60 schedules within the sensitive and resistant groups and found no
effect of Schedule in either group (3-way ANOVAs for Schedule × Devaluation × Pre/post),
which parallels our previous work showing that schedule only weakly influences strategy
for cocaine seeking [13]. Although, the results again indicated increased goal-directed
control in the sensitive group (Devaluation
F
1,29 = 29.0,
p
< 0.0001; Devaluation × Pre/post
interaction
F
1,29 = 7.29,
p
= 0.012), with post hoc analysis revealing significant sensitivity to
devaluation post-punishment for rats trained on RR20 (
p
= 0.0003) and RI60 (
p
= 0.0027).
In contrast, resistant rats showed no significant main effects or interactions. Finally, a
comparison of RR20 and RI60 for sensitivity to punishment on the fourth day revealed no
difference (
t
51 = 1.06,
p
= 0.29).
We then determined whether habitual responding was predictive of punishment resistance.
When rats were classified as goal-directed (<0.4 for normalized devalued responding) or
habitual (≥0.4) based on pre-punishment outcome devaluation, there was no difference in
terms of baseline responding or the fourth day of punishment, for male rats (Fig. 3a;
2-way ANOVA: Session
F
1,26 = 58.5,
p
< 0.0001; Session × Strategy interaction
p
=
0.15) or female rats (Fig. 3b; 2-way ANOVA: Session
F
1,23 = 74.6,
p
< 0.0001; Session
× Strategy interaction
p
= 0.44). Similarly, when classified as goal-directed or habitual
based on post-punishment outcome devaluation, there was also no difference in terms
of baseline responding or the fourth day of punishment, for male rats (Fig. 3c; 2-way
ANOVA: Session
F
1,26 = 57.0,
p
< 0.0001; Session × Strategy interaction
p
= 0.13) or
female rats (Fig. 3d; 2-way ANOVA: Session
F
1,23 = 76.3,
p
< 0.0001; Session × Strategy
interaction
p
= 0.21). However, we found that punishment resistance was correlated with
habits post-punishment, but not pre-punishment. Specifically, responding on the fourth day
of punishment (% baseline trials) correlated with devalued responding during outcome
devaluation conducted post-punishment in males (Fig. 3e;
r
= 0.35,
p
= 0.069) and females
(Fig. 3f,
r
= 0.40,
p
= 0.045). In contrast, punishment did not correlate with devalued
responding conducted pre-punishment in males (
r
= 0.24,
p
= 0.23) or females (
r
= −0.03,
p
= 0.87). Interestingly, habitual responding was not required for punishment resistance,
and some male and female rats that showed resistance (≥65 % on x-axis) also showed
goal-directed responding post-punishment (<0.4 on y-axis).
3.2. Food self-administration
Separate groups of male and female rats were trained on a seeking-taking chained schedule
of self-administration for food and then exposed to four days of punishment testing (Fig.
1). Food self-administration sessions were limited to 1 h or 30 rewards (whichever occurred
first), so we used reward rate (pellets per min) to more accurately assess the effects of
Jones et al. Page 8
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
punishment. For each rat, the four days before punishment were used as a baseline. Similar
to cocaine, we used a threshold of 65 % on the fourth punishment session to identify rats
that were resistant to punishment. Sensitive and resistant rats were significantly different
in terms of percent baseline during punishment, for both males (Fig. 4a; 2-way ANOVA:
Group
F
1,31 = 18.1,
p
= 0.0002; Day
F
11,341 = 11.9,
p
< 0.0001; Group × Day interaction
F
11,341 = 6.37,
p
< 0.0001) and females (Fig. 4b; 2-way ANOVA: Group
F
1,20 = 16.3,
p
=
0.0006; Day
F
11,220 = 14.1,
p
< 0.0001; Group × Day interaction
F
11,220 = 6.36,
p
< 0.0001).
Sensitive and resistant rats also differed in terms of reward rate during punishment, but not
before punishment, for males (Fig. 4c; 2-way ANOVA: Group
p
= 0.11; Day
F
11,341 = 12.8,
p
< 0.0001; Group × Day interaction
F
11,341 = 7.12,
p
< 0.0001) and females (Fig. 4d; 2-way
ANOVA: Group
p
= 0.99; Day
F
11,220 = 13.3,
p
< 0.0001; Group × Day interaction
F
11,220 =
6.61,
p
< 0.0001).
Males were significantly more resistant than females for food, when comparing percent
baseline on the fourth day of punishment for all rats (
t
53 = 2.4,
p
= 0.020; male average
79.2% vs. female 59.0 %). However, there was no significant difference between males and
females for baseline reward rate during self-administration (
t
53 = 1.9,
p
= 0.066), despite
large differences in weight between males and females at the time of punishment testing (
t
53
= 19.2,
p
< 0.0001; male average 470 g, female 295 g). We found that males were more
resistant for food than cocaine (
t
59 = 2.9,
p
= 0.0057), but there was no difference in females
for resistance to food and cocaine (
t
45 = 0.69,
p
= 0.49).
We used outcome devaluation via satiety to assess whether responding was goal-directed
or habitual four days pre-punishment and at least four days post-punishment (once rats
had recovered from punishment), with each rat given devaluation sessions (food pellets in
home cage) and nondevaluation sessions (15 % sucrose in home cage) in a counterbalanced
order. In male rats (Fig. 4e), both punishment-sensitive and -resistant rats were sensitive to
outcome devaluation pre- and post-punishment, although sensitive rats showed even greater
sensitivity post-punishment, indicating enhanced goal-directed control (2-way ANOVA:
Devaluation
F
1,62 = 73.6,
p
< 0.001; Devaluation × Group interaction
p
= 0.10). In female
rats (Fig. 4f), punishment-sensitive rats were sensitive to outcome devaluation pre- and
post-punishment, while punishment-resistant rats were insensitive to outcome devaluation
pre- and post-punishment, indicating habitual behavior (2-way ANOVA: Devaluation
F
1,40 =
23.3,
p
< 0.0001; Group
F
3,40 = 640,
p
< 0.0001; Devaluation × Group interaction
F
3,40 =
5.55,
p
= 0.0028). This mimicked what we observed with cocaine punishment in female rats.
Raw data for devaluation testing is shown in Fig. S3.
Further statistical analyses were used to compare males and females within the sensitive
and resistant groups. This analysis indicated significant effects for Sex within the sensitive
group (3-way ANOVA for Sex × Devaluation × Pre/post: Sex
F
1,19 = 683,
p
< 0.0001; Sex ×
Pre/post
F
1,19 = 14.8,
p
= 0.0011) and the resistant group (Sex
F
1,32 = 831,
p
< 0.0001; Sex
× Devaluation interaction
F
1,32 = 6.29,
p
= 0.017; Sex × Pre/post interaction
F
1,32 = 125,
p
< 0.0001), but no significant post hoc differences between males and females. Even though
males tended to be goal-directed regardless of whether they were sensitive or resistant to
punishment, we still observed that sensitive rats showed increased goal-directed control
in response to punishment, as indicated by a significant interaction between Devaluation
Jones et al. Page 9
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
and Pre/post (Devaluation
F
1,19 = 47.1,
p
< 0.0001; Pre/post
F
1,19 = 30.1,
p
< 0.0001;
Devaluation × Pre/post interaction
F
1,19 = 7.74,
p
= 0.012), with post hoc analysis revealing
significant sensitivity to devaluation pre-punishment for males (
p
= 0.033) and females (
p
=
0.0029) and post-punishment for males (
p
< 0.0001) and females (
p
= 0.0001). In contrast,
resistant rats did not show a significant Devaluation × Pre/post interaction, and only showed
significant main effects (Devaluation
F
1,32 = 11.9,
p
= 0.0016; Pre/post
F
1,32 = 169,
p
<
0.0001), with post hoc analysis revealing significant sensitivity to devaluation for males
pre-punishment (
p
= 0.0002) and post-punishment (
p
= 0.0008) but not for females.
We then evaluated potential differences for rats trained on RR20 vs. RI60 within the
sensitive and resistant groups. We found several effects for Schedule in the sensitive
group (3-way ANOVA for Schedule × Devaluation × Pre/post: Schedule
F
1,19 = 8.91,
p
= 0.0076; Schedule × Devaluation interaction
F
1,19 = 6.10,
p
= 0.023; Schedule × Pre/post
interaction
F
1,19 = 17.7,
p
= 0.0005), with post hoc analysis revealing significant sensitivity
to devaluation for RR20 both pre-punishment (
p
< 0.0001) and post-punishment (
p
<
0.0001), and for RI60 post-punishment only (
p
= 0.0043). This parallels previous work
showing an influence of schedule on strategy for food seeking [13,15,16,18, 21–24]. Within
the resistant group, there was no main effect for Schedule, but a significant Schedule ×
Pre/post interaction (
F
1,32 = 40.0,
p
< 0.0001), with post hoc analysis revealing significant
sensitivity to devaluation for RR20-trained rats both pre-punishment (
p
= 0.0056) and
post-punishment (
p
= 0.031). We again observed support for the conclusion that sensitive
rats showed increased goal-directed control in response to punishment, as indicated by a
significant Devaluation × Pre/post interaction (Devaluation
F
1,19 = 60.4,
p
< 0.0001; Pre/
post
F
1,19 = 37.1,
p
< 0.0001; Devaluation × Pre/post interaction
F
1,19 = 6.36,
p
= 0.021),
while resistant rats only showed main effects without an interaction (Devaluation
F
1,32 =
22.0,
p
< 0.0001; Pre/post
F
1,32 = 62.4,
p
< 0.0001). Finally, a comparison of RR20 and
RI60 for sensitivity to punishment on the fourth day revealed no difference (
t
53 = 0.49,
p
=
0.63). Therefore, RR20-trained rats were goal-directed pre- and post-punishment, regardless
of whether they were sensitive or resistant to punishment, while RI60-trained rats were
habitual pre-punishment but became goal-directed if they were sensitive to punishment.
We determined whether habitual responding was predictive of punishment resistance. When
rats were classified as goal-directed (<0.4 for normalized devalued responding) or habitual
(≥0.4) based on pre-punishment outcome devaluation, there was no difference in terms
of baseline responding on the fourth day of punishment, for male rats (Fig. 5a; 2-way
ANOVA: Session
F
1,31 = 11.0,
p
= 0.0024; Session × Strategy interaction
p
= 0.61)
or female rats (Fig. 5b; 2-way ANOVA: Session
F
1,20 = 28.4,
p
< 0.0001; Session ×
Strategy interaction
p
= 0.082). When classified as goal-directed or habitual based on
post-punishment outcome devaluation, male rats did not show a significant difference in
terms of baseline responding or the fourth day of punishment (Fig. 5c; 2-way ANOVA:
Session
F
1,31 = 7.66,
p
= 0.0095; Session × Strategy interaction
p
= 0.12). However, female
rats showed a significant difference for the fourth day of punishment, indicating that the rats
classified as habitual post-punishment showed greater punishment resistance (Fig. 5d; 2-way
ANOVA: Session
F
1,20 = 37.0,
p
< 0.0001; Session × Strategy interaction
F
1,20 = 8.96,
p
= 0.0072). Similar to cocaine studies, we found that punishment resistance was correlated
with habits post-punishment. Responding on the fourth day of punishment (% baseline
Jones et al. Page 10
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
reward rate) correlated with devalued responding during outcome devaluation conducted
post-punishment in males (Fig. 5;
r
= 0.39,
p
= 0.024), but not pre-punishment (
r
= 0.16,
p
= 0.38). In contrast, in females, punishment correlated with devalued responding conducted
pre-punishment (Fig. 5f;
r
= 0.44,
p
= 0.040) and post-punishment (
r
= 0.64,
p
= 0.0015).
Particularly for male rats, habitual responding was not required for punishment resistance,
and some male rats that showed resistance (≥65 % on x-axis) also showed goal-directed
responding post-punishment (<0.4 on y-axis).
3.3. Influences on punishment resistance
We ran correlations to determine whether punishment resistance was associated with
and/or might be explained by differences in footshock sensitivity or weight. To determine
sensitivity to footshock, we conducted footshock threshold testing (threshold for flinch,
jump, and vocalization, or FJV) prior to any self-administration training. We found no
difference for initial footshock sensitivity between punishment-resistant and -sensitive
groups for cocaine or food in males or females (2-way ANOVAs of FJV vs. group,
p
> 0.05).
For cocaine, we found a significant correlation in female rats for vocalization threshold
and punishment resistance (
r
= 0.59,
p
= 0.0021), but no other correlations in females or
males (Fig. 6a, b). For food, we found no correlations between footshock sensitivity and
punishment in males or females (Fig. 6c, d).
We found no significant correlations between punishment sensitivity and animal weight
on the first day of punishment; the average weights on the first day of punishment were:
cocaine males (540 g), cocaine females (320 g), food males (470 g), and food females (300
g). The average weight gain from the start of self-administration to punishment testing was:
cocaine males (200 g), cocaine females (50 g), food males (100 g), and food females (20
g). We also found no significant correlations between footshock sensitivity testing and the
starting weight of the animals in females or males. These data indicate that differences
in footshock sensitivity or punishment resistance were not related to differences in animal
weight.
We ran correlations between punishment sensitivity and several other self-administration
measures for cocaine and food in males and females. We found no significant correlations
with the number of seeking presses in the final seeking-taking sessions prior to punishment
testing (
p
> 0.05). We also found no correlations with self-administration rates during the
final seeking-taking sessions (Fig. 6e–h), although we observed a significant correlation with
early FR1 self-administration for food in females (Fig. 6 h;
r
= 0.44,
p
= 0.039) and a nearly
significant correlation with early FR1 self-administration for cocaine in males (Fig. 6e;
r
= −0.35,
p
= 0.07). In other words, females that showed greater punishment resistance
for food also self-administered food at a quicker rate under continuous reinforcement
(FR1) conditions, which may indicate higher motivation for food. In contrast, males that
showed greater punishment resistance for cocaine tended to self-administer less cocaine
under continuous reinforcement conditions.
Finally, we also investigated possible influences of estrous cycle on punishment sensitivity
in female rats. We categorized rats by cycle phase for each of the four days of punishment
Jones et al. Page 11
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
and found no differences in punishment sensitivity for cocaine or food (Fig. S4), indicating
that punishment sensitivity was not related to estrous cycle.
4. Discussion
We found that punishment resistance for cocaine was associated with habitual responding
after, but not before, punishment. In other words, habits did not predict punishment
resistance, but punishment resistance was related to the continued use of habits. We
observed similar results with rats trained to self-administer food, with punishment resistance
related to the continued use of habits, particularly in females. These data indicate that
punishment resistance is associated with inflexible habits, whereas punishment sensitivity is
associated with increased goal-directed control.
4.1. Punishment resistance and inflexible habits
These findings support the hypothesis that compulsive drug seeking is related to failure
to control habits [5–10]. While habits themselves are not necessarily maladaptive or
permanent, compulsive drug use may be related to a loss of control over habitual
seeking, making them inflexible and maladaptive. This idea is corroborated by our finding
that habitual cocaine seeking did not predict punishment resistance. Rather, punishment
resistance was related to inflexible habits, whereas punishment sensitivity was related
to increased goal-directed control over behavior. Interestingly, habitual behavior was not
necessary for punishment resistance and a subset of resistant animals showed goal-directed
cocaine seeking. This supports an alternative theory which posits that addiction is driven by
excessive goal-directed choice and/or over-valuation of drug reward, and that habits are not
necessary [56,57].
4.2. Punishment sensitivity and flexible habits
We observed that many rats showed a switch from habitual to goal-directed cocaine seeking
when faced with footshock consequences, particularly when sensitive to punishment.
Previous studies have also shown animals switching between goal-directed and habitual
response strategies. Most commonly, these studies demonstrated a transition from goal-
directed to habitual behavior, with the transition occurring gradually over time with
progressive training for food [40,41,58,59], cocaine [25,26], or alcohol [34,37]. Similarly,
a transition from DMS to DLS control over reward-seeking behavior has been depicted
over training [25,29,34,60]. However, several recent studies from Bouton and colleagues
demonstrated a transition from habitual to goal-directed responding, instead, and this
transition occurred rapidly [12]. The circumstances that caused behaviors to become goal-
directed include changes in context, changes in outcome, or unexpected food reinforcers,
even when delivered in a different context [58,61–64]. Here, we found that the addition of an
aversive outcome enhanced goal-directed responding, but only in animals that were sensitive
to footshock punishment. Altogether, these findings indicate that habits generally are not
permanent and that the goal-directed system can gain control over behavior.
Typically, habits are somewhat flexible. While habits are insensitive to changes in the
value of the outcome, they are sensitive to changes in the outcome. Therefore, habits are
Jones et al. Page 12
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
not completely inflexible and behavior is typically updated under certain circumstances,
including after devaluation when an animal experiences the outcome in the devalued
state [9,23,40,65,66]. For example, rats over-trained for food responding showed habitual
behavior and insensitivity to outcome devaluation when tested under extinction conditions,
but then became sensitive and reduced responding after several minutes of experiencing the
food reward actually being delivered [9,40]. Lesions of DMS slowed learning of this effect,
indicating that goal-directed processes are typically recruited and that the learning process
in the habit system is slower [9]. Ostlund and Balleine [9] hypothesized that fast changes
in performance (e.g., when faced with negative consequences) require a transition to goal-
directed control, and that compulsive behavior in addiction may be related to a dominant
habit system and difficulty reengaging goal-directed control. The data presented here
support this hypothesis and indicate that the goal-directed and habitual systems function in
parallel, such that both DMS and DLS encode the behavior. This explains why post-training
inactivation of DMS or DLS leaves behavior intact but guided by the remaining system
[18,23,25,34,36], and why habitual behavior can rapidly transition to being goal-directed
[12].
We found that the addition of a footshock outcome enhanced goal-directed responding in
punishment-sensitive rats. This may seem contradictory to previous work showing that stress
biases toward habitual behavior in humans and animals (asreviewedby[10,67]). However,
the impact of stress is dependent on the controllability (or escapability) of the stressor.
While inescapable stress has negative long-term effects and recruits the habit system,
escapable stress is protective against future insults and recruits the goal-directed system,
including DMS and the prelimbic prefrontal cortex [68–73]. Previous work has implicated
impairments in prelimbic function with punishment resistance [46,74–76]. Therefore, it
is tempting to speculate that punishment-resistant rats may have a reduced ability to
detect control (or contingency) of the footshock [77]. Further, because prelimbic cortex
is necessary for the acquisition of goal-directed responding [78], reduced prelimbic function
might impair the ability to recruit the goal-directed system.
We cannot rule out the possibility that individual differences in responding after
experimenter-administered IV cocaine (i.e., differences in sensitivity to outcome
devaluation) could be attributed to individual differences in cocaine satiety, rather than
differences in the use of goal-directed and habitual strategies. However, this seems
unlikely given that the IV cocaine doses used for devaluation are tailored to individual self-
administration rates, and that sensitive and insensitive rats self-administer similar amounts
of cocaine and experience similar brain cocaine concentrations, indicating comparable
cocaine satiety [13]. Further, individual differences in cocaine satiety cannot fully explain
why some rats showed increased sensitivity to outcome devaluation following punishment
in the current study or why lesions of DMS or DLS affected sensitivity to outcome
devaluation in our previous study [13]. We would not expect footshock punishment to
change individual cocaine satiety, and even though many rats reduced responding and
infusions during punishment, we ensured that rats returned to baseline responding before
testing for post-punishment outcome devaluation. Likewise, we would not expect lesions of
DMS or DLS to alter individual cocaine satiety; accordingly, we observed no differences in
self-administered infusions with lesions, although we observed differences in sensitivity to
Jones et al. Page 13
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
outcome devaluation [13]. Thus, it is unlikely that individual differences in cocaine satiety
account for the observed differences in sensitivity to outcome devaluation.
4.3. Reward and sex differences
We observed punishment resistance with both cocaine and food self-administration, and
even found that punishment resistance was greater for food than cocaine in male rats. In
contrast, previous studies did not observe punishment resistance with food, even in rats
with extended sucrose or chow self-administration [3,4,39,42,43]. In addition, punishment
resistance for cocaine was observed with extended, but not limited, exposure to cocaine self-
administration [2,3,27,79], although the pattern of intake is also an important consideration
[80,81]. These discrepancies may be attributed to differences in methods, including
footshock intensity, omission of cocaine on footshock trials, schedule of reinforcement, and
criteria for resistance.
We found no sex difference in punishment resistance for cocaine, but found increased
punishment resistance in males for food, as compared to females. Previous studies have
shown female rats to be more sensitive to punishment for cocaine and more sensitive
to punishment with risky food rewards [28,50,51,82,83]. Females also appeared to show
greater punishment resistance for cocaine than food [50]. In contrast, we observed no
significant difference for punishment resistance between food and cocaine for females but
found a significant difference between food and cocaine for males. Sex differences in
punishment sensitivity for food can likely be traced to sex differences in food motivation and
eating. Although we found no sex difference in food reward rate during self-administration,
male rats gained more weight than females across the course of the study, and previous
work showed that male rats work harder than females to earn food rewards, even despite
footshock risk [82,83]. We found that punishment-resistant rats were insensitive to outcome
devaluation pre- and post-punishment in most groups we studied (males and females with
cocaine, and females with food). However, male rats with punishment resistance for food
were sensitive to outcome devaluation pre- and post-punishment, indicating goal-directed
responding. Therefore, punishment resistance for food in male rats was not necessarily
related to inflexible habits, and in some animals, seemed to be more related to goal-directed
actions. Interestingly, punishment resistance for food in female rats was uniquely correlated
with habitual responding pre- and post-punishment, whereas all other groups (males and
females with cocaine, and males with food) were correlated with habitual responding post-
punishment but not pre-punishment. Punishment resistance for food in females was also
unique in that is showed a positive correlation with food reward rate under continuous
reinforcement conditions (early FR1 training). Therefore, the female rats that eventually
showed punishment resistance for food tended to be more motivated for food (faster reward
rate) and less sensitive to outcome devaluation (i.e., more habitual).
4.4. Mechanisms of punishment resistance
The current data support the theory that punishment resistance is related to loss of
control over habits [5–11]. However, it is important to note that support for this theory
is not mutually exclusive with other theories of addiction (seecommentaryby[84]). There
are multiple factors that contribute to drug seeking and addiction, and the factors may
Jones et al. Page 14
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
differ across individuals (i.e. individual differences) or have a compound influence within
an individual. We believe that these factors are shaped by a combination of individual
vulnerability and drug experience. For example, with limited cocaine experience, we
would expect very few rats to show punishment resistance. Then again, with even more
cocaine self-administration experience than given in the current studies, we would likely
see greater punishment resistance (e.g., more rats showing resistance and greater resistance
within-subject). However, we would likely still have a subset of rats showing sensitivity
to punishment. Therefore, we believe that punishment resistance reflects interactions
between individual differences and drug-taking experience, analogous to the development
of addiction in humans. Several factors have been hypothesized or considered for a possible
influence on punishment resistance (via individual differences and/or drug experience),
including habitual behavior, goal-directed behavior, high value of cocaine, low value for
footshock, and reduced contingency learning [7,10,56,85–87].
Although we did not find that habits predicted punishment resistance for cocaine,
previous work showed that punishment-resistant alcohol seeking was greater in rats with
DLS-dependent alcohol seeking [47]. In addition, punished responding for cocaine was
reduced by DLS inactivation [33], but not by inhibition of DMS direct pathway [88],
further implicating DLS in punishment resistance. In contrast, support for the theory
that addiction is related to excess goal-directed motivation comes from work indicating
that habits are not prerequisite for punishment resistance [57], and that resistance was
associated with strengthened activity in a pathway between orbitofrontal cortex and DMS
[89,90]. In addition, punished responding for food in mice was associated with reward-
related dopamine signals in DMS and not DLS, and responding was reduced by DMS
manipulations [91]. We found that male rats showed goal-directed responding for food
self-administration, regardless of whether they were punishment sensitive or resistant, while
female rats showed habitual responding only if they were punishment resistant, which
indicates that the mechanisms underlying punishment resistance may differ across reward
type and sex. Therefore, both the habitual and goal-directed systems appear to be capable of
driving expression of punishment resistance.
Punishment resistance could also be related to increased motivation (or value) for the
reward or decreased sensitivity to the aversive consequence. In support of the former,
punishment resistance was associated with higher break point on a progressive ratio
schedule of reinforcement, as well as lower demand elasticity (i.e., high motivation) using
behavioral-economic measures [2,92–94]. However, other work showed no association
between punishment resistance and break point for cocaine or sucrose rewards, even
though higher doses of cocaine drove greater resistance [50,51]. In the current study, we
observed some evidence that punishment resistance for food in females was associated
with increased food motivation, as resistance correlated with reward rate under continuous
reinforcement conditions. Further, we observed a negative correlation between punishment
resistance for cocaine in males and the amount of cocaine self-administered under
continuous reinforcement conditions, which may be explained by differences in cocaine
value or satiety. However, there is little support for punishment resistance being driven
by decreased sensitivity to aversive consequences. Punishment-resistant alcohol seeking
was not related to differences in footshock-induced fear [47]. Further, punishment-resistant
Jones et al. Page 15
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
cocaine seeking was not correlated with punishment-resistant sucrose seeking, indicating
that punishment resistance cannot simply be attributed to individual differences in footshock
sensitivity [50,51]. We also found that punishment resistance for cocaine or food could
not be explained by decreased sensitivity to footshock. Finally, studies using a conditioned-
punishment task for food rewards found little evidence that punishment resistance was
related to reward dominance or aversion insensitivity; instead, punishment resistance in rats
and humans seemed most causally related to a lack of learning the punishment contingency
and understanding the relationship between actions and aversive outcomes [77,95].
4.5. Conclusions and future directions
We found that punishment resistance for cocaine was associated with inflexible habits,
whereas punishment sensitivity was associated with exerting goal-directed control. We did
not find that habitual cocaine responding predicted punishment resistance. However, future
studies with extended training of cocaine self-administration might reveal that habits become
even more inflexible and predictive of punishment resistance. Future work might also
further explore the hypothesis that punishment resistance is related to impaired contingency
detection.
Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
Acknowledgments
The authors thank the many undergraduate researchers that assisted with conducting behavioral studies. This work
was supported by National Institutes of Health grant DA046457 (RJS) and Texas A&M University.
Data availability
Data will be made available on request.
References
[1]. Belin D, Mar AC, Dalley JW, Robbins TW, Everitt BJ, High impulsivity predicts the switch to
compulsive cocaine-taking, Science (1979) 320 (2008) 1352–1355, 10.1126/science.1158136.
[2]. Deroche-Gamonet V, Belin D, Piazza PV, Evidence for addiction-like behavior in the rat, Science
(1979) 305 (2004) 1014–1017, 10.1126/science.1099020.
[3]. Pelloux Y, Everitt BJ, Dickinson A, Compulsive drug seeking by rats under punishment:
effects of drug taking history, Psychopharmacology. (Berl) 194 (2007) 127–137, 10.1007/
s00213-007-0805-0. [PubMed: 17514480]
[4]. Vanderschuren LJMJ, Everitt BJ, Drug seeking becomes compulsive after prolonged cocaine
self-administration, Science (1979) 305 (2004) 1017–1019, 10.1126/science.1098975.
[5]. Belin D, Belin-Rauscent A, Murray JE, Everitt BJ, Addiction: failure of control over maladaptive
incentive habits, Curr. Opin. Neurobiol 23 (2013) 564–572, 10.1016/j.conb.2013.01.025.
[PubMed: 23452942]
[6]. Everitt BJ, Neural and psychological mechanisms underlying compulsive drug seeking habits
and drug memories–indications for novel treatments of addiction, Eur. J. Neurosci 40 (2014)
2163–2182, 10.1111/ejn.12644. [PubMed: 24935353]
Jones et al. Page 16
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
[7]. Everitt BJ, Robbins TW, Drug addiction: updating actions to habits to compulsions ten years
on, Annu. Rev. Psychol 67 (2016) 23–50, 10.1146/annurev-psych-122414-033457. [PubMed:
26253543]
[8]. Everitt BJ, Robbins TW, Neural systems of reinforcement for drug addiction: from actions to
habits to compulsion, Nat. Neurosci 8 (2005) 1481–1489, 10.1038/nn1579. [PubMed: 16251991]
[9]. Ostlund SB, Balleine BW, On habits and addiction: an associative analysis of compulsive drug
seeking, Drug Discov. Today Dis. Models 5 (2008) 235–245, 10.1016/j.ddmod.2009.07.004.
[10]. Smith RJ, Laiks LS, Behavioral and neural mechanisms underlying habitual and compulsive
drug seeking, Prog. Neuropsychopharmacol. Biol. Psychiatry 87 (2018) 11–21, 10.1016/
j.pnpbp.2017.09.003. [PubMed: 28887182]
[11]. Brown RM, Dayas CV, James MH, Smith RJ, New directions in modelling dysregulated
reward seeking for food and drugs, Neurosci. Biobehav. Rev 132 (2022) 1037–1048, 10.1016/
j.neubiorev.2021.10.043. [PubMed: 34736883]
[12]. Bouton ME, Context, attention, and the switch between habit and goal-direction in behavior,
Learn. Behav 49 (2021) 349–362, 10.3758/s13420-021-00488-z. [PubMed: 34713424]
[13]. Jones BO, Cruz AM, Kim TH, Spencer HF, Smith RJ, Discriminating goal-directed and habitual
cocaine seeking in rats using a novel outcome devaluation procedure, Learn. Mem 29 (2022)
447–457, 10.1101/lm.053621.122. [PubMed: 36621907]
[14]. Balleine BW, Dickinson A, Goal-directed instrumental action: contingency and incentive
learning and their cortical substrates, Neuropharmacology 37 (1998) 407–419, 10.1016/
s0028-3908(98)00033-1. [PubMed: 9704982]
[15]. Yin HH, Knowlton BJ, The role of the basal ganglia in habit formation, Nat. Rev. Neurosci 7
(2006) 464–476, 10.1038/nrn1919. [PubMed: 16715055]
[16]. Dickinson A, Actions and habits: the development of behavioural autonomy, Philosoph. Transact.
Royal Soc. B: Biol. Sci 308 (1985) 67–78, 10.1098/rstb.1985.0010.
[17]. McNamee D, Liljeholm M, Zika O, O’Doherty JP, Characterizing the associative content of brain
structures involved in habitual and goal-directed actions in humans: a multivariate FMRI study, J.
Neurosci 35 (2015) 3764–3771, 10.1523/JNEUROSCI.4677-14.2015. [PubMed: 25740507]
[18]. Yin HH, Knowlton BJ, Balleine BW, Inactivation of dorsolateral striatum enhances sensitivity to
changes in the action-outcome contingency in instrumental conditioning, Behav. Brain Res 166
(2006) 189–196, 10.1016/j.bbr.2005.07.012. [PubMed: 16153716]
[19]. Balleine BW, O’Doherty JP, Human and rodent homologies in action control: corticostriatal
determinants of goal-directed and habitual action, Neuropsychopharmacology 35 (2010) 48–69,
10.1038/npp.2009.131. [PubMed: 19776734]
[20]. Corbit LH, Janak PH, Posterior dorsomedial striatum is critical for both selective
instrumental and Pavlovian reward learning, Eur. J. Neurosci 31 (2010) 1312–1321, 10.1111/
j.1460-9568.2010.07153.x. [PubMed: 20345912]
[21]. Gremel CM, Costa RM, Orbitofrontal and striatal circuits dynamically encode the shift
between goal-directed and habitual actions, Nat. Commun 4 (2013) 2264, 10.1038/ncomms3264.
[PubMed: 23921250]
[22]. Yin HH, Knowlton BJ, Balleine BW, Lesions of dorsolateral striatum preserve outcome
expectancy but disrupt habit formation in instrumental learning, Eur. J. Neurosci 19 (2004)
181–189, 10.1111/j.1460-9568.2004.03095.x. [PubMed: 14750976]
[23]. Yin HH, Ostlund SB, Knowlton BJ, Balleine BW, The role of the dorsomedial
striatum in instrumental conditioning, Eur. J. Neurosci 22 (2005) 513–523, 10.1111/
j.1460-9568.2005.04218.x. [PubMed: 16045504]
[24]. Dickinson A, Nicholas DJ, Adams CD, The effect of the instrumental training contingency on
susceptibility to reinforcer devaluation, Quart. J. Experim. Psychol. Sect. B 35 (1983) 35–51,
10.1080/14640748308400912.
[25]. Zapata A, Minney VL, Shippenberg TS, Shift from goal-directed to habitual cocaine
seeking after prolonged experience in rats, J. Neurosci 30 (2010) 15457–15463, 10.1523/
JNEUROSCI.4072-10.2010. [PubMed: 21084602]
Jones et al. Page 17
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
[26]. Leong KC, Berini CR, Ghee SM, Reichel CM, Extended cocaine-seeking produces a shift
from goal-directed to habitual responding in rats, Physiol. Behav 164 (2016) 330–335, 10.1016/
j.physbeh.2016.06.021. [PubMed: 27321756]
[27]. Jonkman S, Pelloux Y, Everitt BJ, Drug intake is sufficient, but conditioning is not
necessary for the emergence of compulsive cocaine seeking after extended self-administration,
Neuropsychopharmacology 37 (2012) 1612–1619, 10.1038/npp.2012.6. [PubMed: 22334124]
[28]. Bender BN, Torregrossa MM, Intermittent cocaine self-administration has sex-specific
effects on addiction-like behaviors in rats, Neuropharmacology (2023) 109490, 10.1016/
j.neuropharm.2023.109490. [PubMed: 36889433]
[29]. Murray JE, Belin D, Everitt BJ, Double dissociation of the dorsomedial and dorsolateral striatal
control over the acquisition and performance of cocaine seeking, Neuropsychopharmacology 37
(2012) 2456–2466, 10.1038/npp.2012.104. [PubMed: 22739470]
[30]. Porrino LJ, Daunais JB, Smith HR, Nader MA, The expanding effects of cocaine: studies in
a nonhuman primate model of cocaine self-administration, Neurosci. Biobehav. Rev 27 (2004)
813–820, 10.1016/j.neubiorev.2003.11.013. [PubMed: 15019430]
[31]. Porrino LJ, Lyons D, Smith HR, Daunais JB, Nader MA, Cocaine self-administration produces a
progressive involvement of limbic, association, and sensorimotor striatal domains, J. Neurosci 24
(2004) 3554–3562, 10.1523/JNEUROSCI.5578-03.2004. [PubMed: 15071103]
[32]. Willuhn I, Burgeno LM, Everitt BJ, Phillips PEM, Hierarchical recruitment of phasic dopamine
signaling in the striatum during the progression of cocaine use, Proc. Natl. Acad. Sci. USA 109
(2012) 20703–20708, 10.1073/pnas.1213460109. [PubMed: 23184975]
[33]. Jonkman S, Pelloux Y, Everitt BJ, Differential roles of the dorsolateral and midlateral
striatum in punished cocaine seeking, J. Neurosci 32 (2012) 4645–4650, 10.1523/
JNEUROSCI.0348-12.2012. [PubMed: 22457510]
[34]. Corbit LH, Nie H, Janak PH, Habitual alcohol seeking: time course and the contribution
of subregions of the dorsal striatum, Biol. Psychiatry 72 (2012) 389–395, 10.1016/
j.biopsych.2012.02.024. [PubMed: 22440617]
[35]. Corbit LH, Nie H, Janak PH, Habitual responding for alcohol depends upon both AMPA and
D2 receptor signaling in the dorsolateral striatum, Front. Behav. Neurosci 8 (2014) 301, 10.3389/
fnbeh.2014.00301. [PubMed: 25228865]
[36]. Giuliano C, Belin D, Everitt BJ, Compulsive alcohol seeking results from a failure to
disengage dorsolateral striatal control over behavior, J. Neurosci 39 (2019) 1744–1754, 10.1523/
JNEUROSCI.2615-18.2018. [PubMed: 30617206]
[37]. Giuliano C, Puaud M, Cardinal RN, Belin D, Everitt BJ, Individual differences in the engagement
of habitual control over alcohol seeking predict the development of compulsive alcohol seeking
and drinking, Addict. Biol 26 (2021) e13041, 10.1111/adb.13041. [PubMed: 33955649]
[38]. Lopez MF, Becker HC, Operant ethanol self-administration in ethanol dependent mice, Alcohol
48 (2014) 295–299, 10.1016/j.alcohol.2014.02.002. [PubMed: 24721194]
[39]. Radke AK, Jury NJ, Kocharian A, Marcinkiewcz CA, Lowery-Gionta EG, Pleil KE, McElligott
ZA, McKlveen JM, Kash TL, Holmes A, Chronic EtOH effects on putative measures of
compulsive behavior in mice, Addict. Biol 22 (2017) 423–434, 10.1111/adb.12342. [PubMed:
26687341]
[40]. Adams CD, Variations in the sensitivity of instrumental responding to reinforcer devaluation,
Quart. J. Experim. Psychol. Sect. B 34 (1982) 77–98, 10.1080/14640748208400878.
[41]. Dickinson A, Balleine B, Watt A, Gonzalez F, Boakes RA, Motivational control after extended
instrumental training, Anim. Learn. Behav 23 (1995) 197–206, 10.3758/BF03199935.
[42]. Limpens JHW, Schut EHS, Voorn P, Vanderschuren LJMJ, Using conditioned suppression to
investigate compulsive drug seeking in rats, Drug Alcohol Depend 142 (2014) 314–324, 10.1016/
j.drugalcdep.2014.06.037. [PubMed: 25060961]
[43]. Pelloux Y, Murray JE, Everitt BJ, Differential vulnerability to the punishment of cocaine related
behaviours: effects of locus of punishment, cocaine taking history and alternative reinforcer
availability, Psychopharmacology. (Berl) 232 (2015) 125–134, 10.1007/s00213-014-3648-5.
[PubMed: 24952093]
Jones et al. Page 18
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
[44]. Olmstead MC, Parkinson JA, Miles FJ, Everitt BJ, Dickinson A, Cocaine-seeking by rats:
regulation, reinforcement and activation, Psychopharmacology (Berl) 152 (2000) 123–131,
10.1007/s002130000498. [PubMed: 11057515]
[45]. Olmstead MC, Lafond MV, Everitt BJ, Dickinson A, Cocaine seeking by rats is a goal-
directed action, Behav. Neurosci 115 (2001) 394–402, 10.1037/0735-7044.115.2.394. [PubMed:
11345964]
[46]. Chen BT, Yau HJ, Hatch C, Kusumoto-Yoshida I, Cho SL, Hopf FW, Bonci A, Rescuing
cocaine-induced prefrontal cortex hypoactivity prevents compulsive cocaine seeking, Nature 496
(2013) 359–362, 10.1038/nature12024. [PubMed: 23552889]
[47]. Giuliano C, Peña-Oliver Y, Goodlett CR, Cardinal RN, Robbins TW, Bullmore ET, Belin
D, Everitt BJ, Evidence for a long-lasting compulsive alcohol seeking phenotype in rats,
Neuropsychopharmacology 43 (2018) 728–738, 10.1038/npp.2017.105. [PubMed: 28553834]
[48]. Zhou YQ, Zhang LY, Yu ZP, Zhang XQ, Shi J, Shen HW, Tropisetron facilitates footshock
suppression of compulsive cocaine seeking, Int. J. Neuropsychopharmacol 22 (2019) 574–584,
10.1093/ijnp/pyz023. [PubMed: 31125405]
[49]. Limpens JHW, Damsteegt R, Broekhoven MH, Voorn P, Vanderschuren LJMJ, Pharmacological
inactivation of the prelimbic cortex emulates compulsive reward seeking in rats, Brain Res 1628
(2015) 210–218, 10.1016/j.brainres.2014.10.045. [PubMed: 25451128]
[50]. Datta U, Martini M, Fan M, Sun W, Compulsive sucrose- and cocaine-seeking behaviors
in male and female Wistar rats, Psychopharmacology (Berl) 235 (2018) 2395–2405, 10.1007/
s00213-018-4937-1. [PubMed: 29947917]
[51]. Datta U, Martini M, Sun W, Different functional domains measured by cocaine self-
administration under the progressive-ratio and punishment schedules in male Wistar rats,
Psychopharmacology (Berl) 235 (2018) 897–907, 10.1007/s00213-017-4808-1. [PubMed:
29214467]
[52]. Handel SN, Smith RJ, Making and breaking habits: revisiting the definitions and behavioral
factors that influence habits in animals, J. Exp. Anal. Behav (2023), 10.1002/jeab.889.
[53]. Smith RJ, See RE, Aston-Jones G, Orexin/hypocretin signaling at the orexin 1 receptor
regulates cue-elicited cocaine-seeking, Eur. J. Neurosci 30 (2009) 493–503, 10.1111/
j.1460-9568.2009.06844.x. [PubMed: 19656173]
[54]. Ajayi AF, Akhigbe RE, Staging of the estrous cycle and induction of estrus in experimental
rodents: an update, Fertil. Res. Pract 6 (2020) 5, 10.1186/s40738-020-00074-3. [PubMed:
32190339]
[55]. Maren S, DeCola JP, Swain RA, Fanselow MS, Thompson RF, Parallel augmentation of
hippocampal long-term potentiation, theta rhythm, and contextual fear conditioning in water-
deprived rats, Behav. Neurosci 108 (1994) 44–56, 10.1037/0735-7044.108.1.44. [PubMed:
8192850]
[56]. Hogarth L, Addiction is driven by excessive goal-directed drug choice under negative affect:
translational critique of habit and compulsion theory, Neuropsychopharmacology 45 (2020) 720–
735, 10.1038/s41386-020-0600-8. [PubMed: 31905368]
[57]. Singer BF, Fadanelli M, Kawa AB, Robinson TE, Are cocaine-seeking “habits” necessary
for the development of addiction-like behavior in rats? J. Neurosci 38 (2018) 60–73, 10.1523/
JNEUROSCI.2458-17.2017. [PubMed: 29158359]
[58]. Thrailkill EA, Bouton ME, Contextual control of instrumental actions and habits, J. Exp.
Psychol. Anim. Learn. Cogn 41 (2015) 69–80, 10.1037/xan0000045. [PubMed: 25706547]
[59]. Holland PC, Relations between Pavlovian-instrumental transfer and reinforcer devaluation, J.
Exp. Psychol. Anim. Behav. Process 30 (2004) 104–117, 10.1037/0097-7403.30.2.104. [PubMed:
15078120]
[60]. Murray JE, Dilleen R, Pelloux Y, Economidou D, Dalley JW, Belin D, Everitt BJ, Increased
impulsivity retards the transition to dorsolateral striatal dopamine control of cocaine seeking,
Biol. Psychiatry 76 (2014) 15–22, 10.1016/j.biopsych.2013.09.011. [PubMed: 24157338]
[61]. Trask S, Shipman ML, Green JT, Bouton ME, Some factors that restore goal-direction to
a habitual behavior, Neurobiol. Learn. Mem 169 (2020) 107161, 10.1016/j.nlm.2020.107161.
[PubMed: 31927081]
Jones et al. Page 19
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
[62]. Steinfeld MR, Bouton ME, Context and renewal of habits and goal-directed actions after
extinction, J. Exp. Psychol. Anim. Learn. Cogn 46 (2020) 408–421, 10.1037/xan0000247.
[PubMed: 32378909]
[63]. Bouton ME, Broomer MC, Rey CN, Thrailkill EA, Unexpected food outcomes can return a habit
to goal-directed action, Neurobiol. Learn. Mem 169 (2020) 107163, 10.1016/j.nlm.2020.107163.
[PubMed: 31927082]
[64]. Steinfeld MR, Bouton ME, Renewal of goal direction with a context change after habit learning,
Behav. Neurosci 135 (2021) 79–87, 10.1037/bne0000422. [PubMed: 33119327]
[65]. Balleine BW, Killcross AS, Dickinson A, The effect of lesions of the basolateral
amygdala on instrumental conditioning, J. Neurosci 23 (2003) 666–675, 10.1523/
JNEUROSCI.23-02-00666.2003. [PubMed: 12533626]
[66]. Corbit LH, Balleine BW, The role of prelimbic cortex in instrumental conditioning, Behav. Brain
Res 146 (2003) 145–157, 10.1016/j.bbr.2003.09.023. [PubMed: 14643467]
[67]. Schwabe L, Wolf OT, Stress-induced modulation of instrumental behavior: from goal-directed
to habitual control of action, Behav. Brain Res 219 (2011) 321–328, 10.1016/j.bbr.2010.12.038.
[PubMed: 21219935]
[68]. Maier SF, Amat J, Baratta MV, Paul E, Watkins LR, Behavioral control, the medial prefrontal
cortex, and resilience, Dialogues Clin. Neurosci 8 (2006) 397–406, 10.31887/DCNS.2006.8.4/
smaier. [PubMed: 17290798]
[69]. Maier SF, Watkins LR, Role of the medial prefrontal cortex in coping and resilience, Brain Res
1355 (2010) 52–60, 10.1016/j.brainres.2010.08.039. [PubMed: 20727864]
[70]. Amat J, Paul E, Watkins LR, Maier SF, Activation of the ventral medial prefrontal cortex during
an uncontrollable stressor reproduces both the immediate and long-term protective effects of
behavioral control, Neuroscience 154 (2008) 1178–1186, 10.1016/j.neuroscience.2008.04.005.
[PubMed: 18515010]
[71]. Amat J, Paul E, Zarza C, Watkins LR, Maier SF, Previous experience with behavioral
control over stress blocks the behavioral and dorsal raphe nucleus activating effects of later
uncontrollable stress: role of the ventral medial prefrontal cortex, J. Neurosci 26 (2006) 13264–
13272, 10.1523/JNEUROSCI.3630-06.2006. [PubMed: 17182776]
[72]. Amat J, Baratta MV, Paul E, Bland ST, Watkins LR, Maier SF, Medial prefrontal cortex
determines how stressor controllability affects behavior and dorsal raphe nucleus, Nat. Neurosci
8 (2005) 365–371, 10.1038/nn1399. [PubMed: 15696163]
[73]. Amat J, Christianson JP, Aleksejev RM, Kim J, Richeson KR, Watkins LR, Maier SF, Control
over a stressor involves the posterior dorsal striatum and the act/outcome circuit, Eur. J. Neurosci
40 (2014) 2352–2358, 10.1111/ejn.12609. [PubMed: 24862585]
[74]. Radke AK, Nakazawa K, Holmes A, Cortical GluN2B deletion attenuates punished
suppression of food reward-seeking, Psychopharmacology. (Berl) 232 (2015) 3753–3761,
10.1007/s00213-015-4033-8. [PubMed: 26223494]
[75]. Verharen JPH, van den Heuvel MW, Luijendijk M, Vanderschuren LJMJ, Adan RAH,
Corticolimbic mechanisms of behavioral inhibition under threat of punishment, J. Neurosci 39
(2019) 4353–4364, 10.1523/JNEUROSCI.2814-18.2019. [PubMed: 30902868]
[76]. Kasanetz F, Lafourcade M, Deroche-Gamonet V, Revest JM, Berson N, Balado E, Fiancette
JF, Renault P, Piazza PV, Manzoni OJ, Prefrontal synaptic markers of cocaine addiction-like
behavior in rats, Mol. Psychiatry 18 (2013) 729–737, 10.1038/mp.2012.59. [PubMed: 22584869]
[77]. Jean-Richard-Dit-Bressel P, Ma C, Bradfield LA, Killcross S, McNally GP, Punishment
insensitivity emerges from impaired contingency detection, not aversion insensitivity or reward
dominance, Elife 8 (2019), 10.7554/eLife.52765.
[78]. Ostlund SB, Balleine BW, Lesions of medial prefrontal cortex disrupt the acquisition but
not the expression of goal-directed learning, J. Neurosci 25 (2005) 7763–7770, 10.1523/
JNEUROSCI.1921-05.2005. [PubMed: 16120777]
[79]. Xue Y, Steketee JD, Sun W, Inactivation of the central nucleus of the amygdala reduces the
effect of punishment on cocaine self-administration in rats, Eur. J. Neurosci 35 (2012) 775–783,
10.1111/j.1460-9568.2012.08000.x. [PubMed: 22304754]
Jones et al. Page 20
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
[80]. Kawa AB, Bentzley BS, Robinson TE, Less is more: prolonged intermittent access
cocaine self-administration produces incentive-sensitization and addiction-like behavior,
Psychopharmacology (Berl) 233 (2016) 3587–3602, 10.1007/s00213-016-4393-8. [PubMed:
27481050]
[81]. James MH, Stopper CM, Zimmer BA, Koll NE, Bowrey HE, Aston-Jones G, Increased number
and activity of a lateral subpopulation of hypothalamic orexin/hypocretin neurons underlies
the expression of an addicted state in rats, Biol. Psychiatry 85 (2019) 925–935, 10.1016/
j.biopsych.2018.07.022. [PubMed: 30219208]
[82]. Orsini CA, Willis ML, Gilbert RJ, Bizon JL, Setlow B, Sex differences in a rat model of
risky decision making, Behav. Neurosci 130 (2016) 50–61, 10.1037/bne0000111. [PubMed:
26653713]
[83]. Jacobs DS, Moghaddam B, Prefrontal cortex representation of learning of punishment
probability during reward-motivated actions, J. Neurosci 40 (2020) 5063–5077, 10.1523/
JNEUROSCI.0310-20.2020. [PubMed: 32409619]
[84]. Epstein DH, Let’s agree to agree: a comment on Hogarth (2020), with a plea for not-
so-competing theories of addiction, Neuropsychopharmacology 45 (2020) 715–716, 10.1038/
s41386-020-0618-y. [PubMed: 31969695]
[85]. Jean-Richard-Dit-Bressel P, Killcross S, McNally GP, Behavioral and neurobiological
mechanisms of punishment: implications for psychiatric disorders, Neuropsychopharmacology
43 (2018) 1639–1650, 10.1038/s41386-018-0047-3. [PubMed: 29703994]
[86]. Lüscher C, Robbins TW, Everitt BJ, The transition to compulsion in addiction, Nat. Rev.
Neurosci 21 (2020) 247–263, 10.1038/s41583-020-0289-z. [PubMed: 32231315]
[87]. Field M, Heather N, Murphy JG, Stafford T, Tucker JA, Witkiewitz K, Recovery from addiction:
behavioral economics and value-based decision making, Psychol. Addict. Behav 34 (2020) 182–
193, 10.1037/adb0000518. [PubMed: 31599604]
[88]. Yager LM, Garcia AF, Donckels EA, Ferguson SM, Chemogenetic inhibition of direct pathway
striatal neurons normalizes pathological, cue-induced reinstatement of drug-seeking in rats,
Addict. Biol 24 (2019) 251–264, 10.1111/adb.12594. [PubMed: 29314464]
[89]. Pascoli V, Hiver A, Van Zessen R, Loureiro M, Achargui R, Harada M, Flakowski J, Lüscher C,
Stochastic synaptic plasticity underlying compulsion in a model of addiction, Nature 564 (2018)
366–371, 10.1038/s41586-018-0789-4. [PubMed: 30568192]
[90]. Hu Y, Salmeron BJ, Krasnova IN, Gu H, Lu H, Bonci A, Cadet JL, Stein EA, Yang Y,
Compulsive drug use is associated with imbalance of orbitofrontal- and prelimbic-striatal circuits
in punishment-resistant individuals, Proc. Natl. Acad. Sci. USA 116 (2019) 9066–9071, 10.1073/
pnas.1819978116. [PubMed: 30988198]
[91]. Seiler JL, Cosme CV, Sherathiya VN, Schaid MD, Bianco JM, Bridgemohan AS, Lerner TN,
Dopamine signaling in the dorsomedial striatum promotes compulsive behavior, Curr. Biol 32
(2022) 1175–1188, 10.1016/j.cub.2022.01.055, e5. [PubMed: 35134327]
[92]. Kasanetz F, Deroche-Gamonet V, Berson N, Balado E, Lafourcade M, Manzoni O, Piazza PV,
Transition to addiction is associated with a persistent impairment in synaptic plasticity, Science
(1979) 328 (2010) 1709–1712, 10.1126/science.1187801.
[93]. Bentzley BS, Jhou TC, Aston-Jones G, Economic demand predicts addiction-like behavior and
therapeutic efficacy of oxytocin in the rat, Proc. Natl. Acad. Sci. USA 111 (2014) 11822–11827,
10.1073/pnas.1406324111. [PubMed: 25071176]
[94]. James MH, Bowrey HE, Stopper CM, Aston-Jones G, Demand elasticity predicts addiction
endophenotypes and the therapeutic efficacy of an orexin/hypocretin-1 receptor antagonist in rats,
Eur. J. Neurosci 50 (2019) 2602–2612, 10.1111/ejn.14166. [PubMed: 30240516]
[95]. Jean-Richard-Dit-Bressel P, Lee JC, Liew SX, Weidemann G, Lovibond PF, McNally GP,
A cognitive pathway to punishment insensitivity, Proc. Natl. Acad. Sci. USA 120 (2023)
e2221634120, 10.1073/pnas.2221634120. [PubMed: 37011189]
Jones et al. Page 21
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Fig. 1. |. Experimental timeline for self-administration training, outcome devaluation testing, and
punishment.
Rats were trained to self-administer IV cocaine or food on a seeking-taking chained
schedule of reinforcement as shown by the different stages, with the requirements for
the seeking and taking levers shown for each stage, as well as the minimum number of
days required at each stage. FR, fixed ratio; RI, random interval; RR, random ratio. After
the final stage of training, rats were given outcome devaluation testing (devaluation and
nondevaluation sessions counterbalanced across two days), before and after punishment
testing. Punishment testing occurred on four days and was preceded by at least four days of
baseline self-administration and followed by at least four days of self-administration prior to
outcome devaluation testing. Created with BioRender.com.
Jones et al. Page 22
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Fig. 2. |. Punishment resistance for cocaine self-administration is associated with inflexible
habits.
A-B) Cocaine trials (% baseline) for the four days before, during, and after punishment
testing for male rats (A) and female rats (B) categorized as punishment resistant or sensitive.
C-D) Cocaine trials (total in 2 h) for male rats (C) and female rats (D). Labels indicate when
outcome devaluation was conducted pre- and post-punishment. E-F) Outcome devaluation
pre- and post-punishment for male rats (E) and female rats (F) that were sensitive or resistant
to punishment. Normalized lever presses are shown for nondevalued and devalued sessions.
p
values < *0.05, **0.01, ***0.001, ****0.0001.
Jones et al. Page 23
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Fig. 3. |. Post-punishment but not pre-punishment cocaine habits are associated with punishment
resistance.
A-B) Cocaine trials (total in 2 h) during baseline seeking-taking and punishment day 4 for
male rats (A) and female rats (B) classified as goal-directed or habitual based on outcome
devaluation conducted pre-punishment. C-D) Cocaine trials for male rats (C) and female
rats (D) classified as goal-directed or habitual based on outcome devaluation conducted
post-punishment. E-F) Relationship between punishment sensitivity (% baseline trials on
Day 4; ≥65 % threshold considered resistant) and outcome devaluation (normalized devalued
responding; ≥0.4 threshold considered habitual) for male rats (E) and female rats (F), with
devaluation scores pre-punishment (dark green) and post-punishment (light green).
p
values
< *0.05, ****0.0001.
Jones et al. Page 24
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Fig. 4. |. Punishment resistance for food self-administration is associated with inflexible habits,
particularly in female rats.
A-B) Reward rate for food (% baseline) for the four days before, during, and after
punishment testing for male rats (A) and female rats (B) categorized as punishment resistant
or sensitive. C-D) Reward rate (pellets/min) for male rats (C) and female rats (D). Labels
indicate when outcome devaluation was conducted pre- and post-punishment. E-F) Outcome
devaluation pre- and post-punishment for male rats (E) and female rats (F) that were
sensitive or resistant to punishment. Normalized lever presses are shown for nondevalued
and devalued sessions.
p
values < *0.05, **0.01, ***0.001, ****0.0001.
Jones et al. Page 25
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Fig. 5. |. Pre- and post-punishment food habits are associated with punishment resistance.
A-B) Reward rate for food (pellets per min) during baseline seeking-taking and punishment
day 4 for male rats (A) and female rats (B) classified as goal-directed or habitual based
on outcome devaluation conducted pre-punishment. C-D) Reward rate for male rats (C)
and female rats (D) classified as goal-directed or habitual based on outcome devaluation
conducted post-punishment. E-F) Relationship between punishment sensitivity (% baseline
reward rate on day 4; ≥65 % threshold considered resistant) and outcome devaluation
(normalized devalued responding; ≥0.4 threshold considered habitual) for male rats (E) and
female rats (F), with devaluation scores pre-punishment (dark blue) and post-punishment
(light blue).
p
values < *0.05, **0.01, ****0.0001.
Jones et al. Page 26
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Fig. 6. |. Relationship between punishment resistance and footshock sensitivity (thresholds for
flinch, jump, and vocalization) and self-administration rates.
A-B) Relationship between cocaine punishment sensitivity (% baseline trials) and footshock
sensitivity in male rats (A) and female rats (B). C-D) Relationship between food punishment
sensitivity (% baseline reward rate) and footshock sensitivity in male rats (C) and female
rats (D). E-F) Relationship between cocaine punishment sensitivity (% baseline trials) and
self-administration (cocaine trials in 2 h) during early FR1 training or final seeking-taking
sessions in male rats (E) and female rats (F). G-H) Relationship between food punishment
sensitivity (% baseline reward rate) and self-administration (food pellets per min) during
early FR1 training or final seeking-taking sessions in male rats (G) and female rats (H).
p
<
*0.05, **0.01.
Jones et al. Page 27
Addict Neurosci
. Author manuscript; available in PMC 2024 June 10.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
... Habitual behavior is elicited by conditioned stimuli and insensitive to outcome devaluation, whereas goal-directed behavior is performed in direct pursuit of the outcome and sensitive to outcome devaluation (Balleine & Dickinson, 1998;Handel & Smith, 2024;. We recently showed that punishment resistance for cocaine is associated with habitual responding, i.e., insensitivity to outcome devaluation (Jones et al., 2024). Although punishment resistance was not predicted by habitual cocaine seeking, it was associated with continued use of habits, whereas punishment sensitivity was associated with increased goaldirected responding (Jones et al., 2024). ...
... We recently showed that punishment resistance for cocaine is associated with habitual responding, i.e., insensitivity to outcome devaluation (Jones et al., 2024). Although punishment resistance was not predicted by habitual cocaine seeking, it was associated with continued use of habits, whereas punishment sensitivity was associated with increased goaldirected responding (Jones et al., 2024). Given that habits are involved in punishment resistance and that RI schedules bias habitual responding, we hypothesized that the RI60 schedule of reinforcement would lead to greater punishment resistance as compared to the RR20 schedule. ...
... Given that habits are involved in punishment resistance and that RI schedules bias habitual responding, we hypothesized that the RI60 schedule of reinforcement would lead to greater punishment resistance as compared to the RR20 schedule. In our previous study, we did not observe a significant difference between RR20 and RI60 schedules in terms of punishment resistance for cocaine when comparing the final day of punishment (Jones et al., 2024); therefore, in the current study we evaluated schedules across multiple punishment days and in larger groups of rats. ...
... Interestingly in Sprague-Dawley rats, punishment resistance during cocaine selfadministration in the presence of an unpredictable 0.4mA foot shock was not predicted by habitual responding during unpunished self-administration, but rather associated with habitual responding after the negative consequence was introduced in both sexes [137]. Sensitivity to punishment in this context was related to a greater shift from habitual to goal-directed behavior in the presence of the adverse consequence [137]. ...
... Interestingly in Sprague-Dawley rats, punishment resistance during cocaine selfadministration in the presence of an unpredictable 0.4mA foot shock was not predicted by habitual responding during unpunished self-administration, but rather associated with habitual responding after the negative consequence was introduced in both sexes [137]. Sensitivity to punishment in this context was related to a greater shift from habitual to goal-directed behavior in the presence of the adverse consequence [137]. ...
... Some studies suggest that addiction-related behavior is associated with uncontrollable habits 42,43 . For example, Jones BO, et al. 44 , found that punishment resistant was related to habits that have become inflexible and persist under conditions that should encourage a transition to goal-directed behavior. Leong KC, et al. 45 suggested that the full acquisition and relapse of addiction-related behavior may be attributed to a shift away from goal-directed responding and a shift towards the maladaptive formation of rigid and habit-like responses. ...
Article
Full-text available
A primary behavioral pathology in drug addiction is the overpowering motivational strength and decreased ability to control the desire to obtain drugs, which shows some variation between different individuals. Here, using a morphine-induced conditioned place preference (CPP) model with footshock, we found that mice exhibited significant individual differences in morphine-induced addiction. Despite the consequences of footshock, a small percentage of mice (24%) still showed stable morphine preference, demonstrating resistant to punishment. The majority of mice (76%) were relatively sensitive to punishment and showed termination of morphine preference. As a region of advanced cognitive function in the mammalian brain, the medial prefrontal cortex (mPFC) is involved in regulating drug-induced addictive behaviors. We found that activating the pyramidal neurons in the prelimbic cortex (PrL) could effectively reverse morphine-induced CPP in resistant mice, and inhibiting pyramidal neurons in the PrL could promote morphine-induced CPP in sensitive mice. To further explore the differences between resistant and sensitive mice, we analyzed the differences in gene expression in their PrL regions through RNA-seq analysis. The results showed that compared to sensitive mice, the significantly downregulated differentially expressed genes (DEGs), such as Panx2, Tcf7l2, Htr2c, Htr5a, Orai3, Slc24a4 and Cacnb2, in resistant mice were mainly involved in synaptic formation and neurodevelopment. We speculated that there may be defects in the neuronal system of resistant mice, and caused they are more prone to morphine-induced CPP. These findings are likely to contribute to research in gene therapy, and they may also serve as potential therapeutic targets for drug addiction.
... But while significant, this correlation is not strong and high variability remains both across and within studies. For instance, Jones and colleagues recently assessed punishment resistance vs. sensitivity in 5 cohorts of rats, with the same experimental procedure, and found rates of resistance ranging from 31 to 63 % (Jones, Paladino, Cruz, Spencer, Kahanek, Scarborough, Georges, and Smith, 2024). This, as well as similar findings from other studies (Table 1), indicates that indiscriminately applying the same shock intensity to every rat can yield different outcomes and suggests that the line between taking drugs despite punishment and drug cessation might be blurrier that we often presuppose. ...
Article
Full-text available
Compulsive behavior is a defining feature of disorders such as substance use disorders. Current evidence suggests that corticostriatal circuits control the expression of established compulsions, but little is known about the mechanisms regulating the development of compulsions. We hypothesized that dopamine, a critical modulator of striatal synaptic plasticity, could control alterations in corticostriatal circuits leading to the development of compulsions (defined here as continued reward seeking in the face of punishment). We used dual-site fiber photometry to measure dopamine axon activity in the dorsomedial striatum (DMS) and the dorsolateral striatum (DLS) as compulsions emerged. Individual variability in the speed with which compulsions emerged was predicted by DMS dopamine axon activity. Amplifying this dopamine signal accelerated animals’ transitions to compulsion, whereas inhibition delayed it. In contrast, amplifying DLS dopamine signaling had no effect on the emergence of compulsions. These results establish DMS dopamine signaling as a key controller of the development of compulsive reward seeking.
Article
Full-text available
This article reviews recent findings from the author’s laboratory that may provide new insights into how habits are made and broken. Habits are extensively practiced behaviors that are automatically evoked by antecedent cues and performed without their goal (or reinforcer) “in mind.” Goal-directed actions, in contrast, are instrumental behaviors that are performed because their goal is remembered and valued. New results suggest that actions may transition to habit after extended practice when conditions encourage reduced attention to the behavior. Consistent with theories of attention and learning, a behavior may command less attention (and become habitual) as its reinforcer becomes well-predicted by cues in the environment; habit learning is prevented if presentation of the reinforcer is uncertain. Other results suggest that habits are not permanent, and that goal-direction can be restored by several environmental manipulations, including exposure to unexpected reinforcers or context change. Habits are more context-dependent than goal-directed actions are. Habit learning causes retroactive interference in a way that is reminiscent of extinction: It inhibits, but does not erase, goal-direction in a context-dependent way. The findings have implications for the understanding of habitual and goal-directed control of behavior as well as disordered behaviors like addictions.
Article
Full-text available
Excessive drinking is an important behavioural characteristic of alcohol addiction, but not the only one. Individuals addicted to alcohol crave alcoholic beverages, spend time seeking alcohol despite negative consequences and eventually drink to intoxication. With prolonged use, control over alcohol seeking devolves to anterior dorsolateral striatum, dopamine‐dependent mechanisms implicated in habit learning and individuals in whom alcohol seeking relies more on these mechanisms are more likely to persist in seeking alcohol despite the risk of punishment. Here, we tested the hypothesis that the development of habitual alcohol seeking predicts the development of compulsive seeking and that, once developed, it is associated with compulsive alcohol drinking. Male alcohol‐preferring rats were pre‐exposed intermittently to a two‐bottle choice procedure and trained on a seeking–taking chained schedule of alcohol reinforcement until some individuals developed punishment‐resistant seeking behaviour. The associative basis of their seeking responses was probed with an outcome‐devaluation procedure, early or late in training. After seeking behaviour was well established, subjects that had developed greater resistance to outcome devaluation (were more habitual) were more likely to show punishment‐resistant (compulsive) alcohol seeking. These individuals also drank more alcohol, despite quinine adulteration, even though having similar alcohol preference and intake before and during instrumental training. They were also less sensitive to changes in the contingency between seeking responses and alcohol outcome, providing further evidence of recruitment of the habit system. We therefore provide direct behavioural evidence that compulsive alcohol seeking emerges alongside compulsive drinking in individuals who have preferentially engaged the habit system.
Article
Full-text available
An instrumental action can be goal-directed after a moderate amount of practice and then convert to habit after more extensive practice. Recent evidence suggests, however, that habits can return to action status after different environmental manipulations. The present experiments therefore asked whether habit learning interferes with goal direction in a context-dependent manner like other types of retroactive interference (e.g., extinction, punishment, counterconditioning). In Experiment 1, rats were given a moderate amount of instrumental training to form an action in one context (Context A) and then more extended training of the same response to form a habit in another context (Context B). We then performed reinforcer devaluation with taste aversion conditioning in both contexts, and tested the response in both contexts. The response remained habitual in Context B, but was goal-directed in Context A, indicating renewal of goal direction after habit learning. Experiment 2 expanded on Experiment 1 by testing the response in a third context (Context C). It found that the habitual response also renewed as action in this context. Together, the results establish a parallel between habit and extinction learning: Conversion to habit does not destroy action knowledge, but interferes with it in a context-specific way. They are also consistent with other results suggesting that habit is specific to the context in which it is learned, whereas goal-direction can transfer between contexts. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Article
Habits have garnered significant interest in studies of associative learning and maladaptive behavior. However, habit research has faced scrutiny and challenges related to the definitions and methods. Differences in the conceptualizations of habits between animal and human studies create difficulties for translational research. Here, we review the definitions and commonly used methods for studying habits in animals and humans and discuss potential alternative ways to assess habits, such as automaticity. To better understand habits, we then focus on the behavioral factors that have been shown to make or break habits in animals, as well as potential mechanisms underlying the influence of these factors. We discuss the evidence that habitual and goal‐directed systems learn in parallel and that they seem to interact in competitive and cooperative manners. Finally, we draw parallels between habitual responding and compulsive drug seeking in animals to delineate the similarities and differences in these behaviors.
Article
Individuals differ in their sensitivity to the adverse consequences of their actions, leading some to persist in maladaptive behaviors. Two pathways have been identified for this insensitivity: a motivational pathway based on excessive reward valuation and a behavioral pathway based on autonomous stimulus-response mechanisms. Here, we identify a third, cognitive pathway based on differences in punishment knowledge and use of that knowledge to suppress behavior. We show that distinct phenotypes of punishment sensitivity emerge from differences in what people learn about their actions. Exposed to identical punishment contingencies, some people (sensitive phenotype) form correct causal beliefs that they use to guide their behavior, successfully obtaining rewards and avoiding punishment, whereas others form incorrect but internally coherent causal beliefs that lead them to earn punishment they do not like. Incorrect causal beliefs were not inherently problematic because we show that many individuals benefit from information about why they are being punished, revaluing their actions and changing their behavior to avoid further punishment (unaware phenotype). However, one condition where incorrect causal beliefs were problematic was when punishment is infrequent. Under this condition, more individuals show punishment insensitivity and detrimental patterns of behavior that resist experience and information-driven updating, even when punishment is severe (compulsive phenotype). For these individuals, rare punishment acted as a "trap," inoculating maladaptive behavioral preferences against cognitive and behavioral updating.
Article
Intermittent access (IntA) models of cocaine self-administration were developed to better model in rodents how cocaine is used by human drug users. Compared to traditional continuous access (ContA) models, IntA has been shown to enhance several pharmacological and behavioral effects of cocaine, but few studies have examined sex differences in IntA. Moreover, no one has examined the efficacy of cue extinction to reduce cocaine seeking in the IntA model, which has previously shown to be ineffective in other models that promote habit-like cocaine seeking. Therefore, rats were implanted with jugular vein catheters and dorsolateral striatum (DLS) cannulae and trained to self-administer cocaine paired with an audiovisual cue with ContA or IntA. In subsets of rats, we evaluated: the ability of Pavlovian cue extinction to reduce cue-induced drug seeking; motivation for cocaine using a progressive ratio procedure; punishment-resistant cocaine taking by pairing cocaine infusions with footshocks; and dependence of drug-seeking on DLS dopamine (a measure of habit-like behavior) with the dopamine antagonist cis-flupenthixol. Overall, cue extinction reduced cue-induced drug seeking after ContA or IntA. Compared to ContA, IntA resulted in increased motivation for cocaine exclusively in females, but IntA facilitated punished cocaine self-administration exclusively in males. After 10 days of IntA training, but not fewer, drug-seeking was dependent on DLS dopamine most notably in males. Our results suggest that IntA may be valuable for identifying sex differences in the early stages of drug use and provide a foundation for the investigation of the mechanisms involved.
Article
Habits are theorized to play a key role in compulsive cocaine seeking, yet there is limited methodology for assessing habitual responding for intravenous (IV) cocaine. We developed a novel outcome devaluation procedure to discriminate goal-directed from habitual responding in cocaine-seeking rats. This procedure elicits devaluation temporarily and requires no additional training, allowing repeated testing at different time points. After training male rats to self-administer IV cocaine, we devalued the drug outcome via experimenter-administered IV cocaine (a “satiety” procedure) prior to a 10-min extinction test. Many rats were sensitive to outcome devaluation, a hallmark of goal-directed responding. These animals reduced responding when given a dose of experimenter-administered cocaine that matched or exceeded satiety levels during self-administration. However, other rats were insensitive to experimenter-administered cocaine, suggesting their responding was habitual. Importantly, reinforcement schedules and neural manipulations that produce goal-directed responding (i.e., ratio schedules or dorsolateral striatum lesions) caused sensitivity to outcome devaluation, whereas reinforcement schedules and neural manipulations that produce habitual responding (i.e., interval schedules or dorsomedial striatum lesions) caused insensitivity. Satiety-based outcome devaluation is an innovative new tool to dissect the neural and behavioral mechanisms underlying IV cocaine-seeking behavior.
Article
Behavioral models are central to behavioral neuroscience. To study the neural mechanisms of maladaptive behaviors (including binge eating and drug addiction), it is essential to develop and utilize appropriate animal models that specifically focus on dysregulated reward seeking. Both food and cocaine are typically consumed in a regulated manner by rodents, motivated by reward and homeostatic mechanisms. However, both food and cocaine seeking can become dysregulated, resulting in binge-like consumption and compulsive patterns of intake. The speakers in this symposium for the 2021 International Behavioral Neuroscience Meeting utilize behavioral models of dysregulated reward-seeking to investigate the neural mechanisms of binge-like consumption, enhanced cue-driven reward seeking, excessive motivation, and continued use despite negative consequences. In this review, we outline examples of maladaptive patterns of intake and explore recent animal models that drive behavior to become dysregulated, including stress exposure and intermittent access to rewards. Lastly, we explore select behavioral and neural mechanisms underlying dysregulated reward-seeking for both food and drugs.
Article
Actions executed toward obtaining a reward are frequently associated with the probability of harm occurring during action execution. Learning this probability allows for appropriate computation of future harm to guide action selection. Impaired learning of this probability may be critical for the pathogenesis of anxiety or reckless and impulsive behavior. Here we designed a task for punishment probability learning during reward-guided actions to begin to understand the neuronal basis of this form of learning, and the biological or environmental variables that influence action selection after learning. Male and female Long-Evans rats were trained in a seek-take behavioral paradigm where the seek action was associated with varying probability of punishment. The take action remained safe and was followed by reward delivery. Learning was evident as subjects selectively adapted seek action behavior as a function of punishment probability. Recording of neural activity in the medial prefrontal cortex (mPFC) during learning revealed changes in phasic mPFC neuronal activity during risky seek actions but not during the safe take actions or reward delivery revealing that this region is involved in learning of probabilistic punishment. After learning, the variables that influenced behavior included reinforcer and punisher value, pretreatment with the anxiolytic diazepam, and sex. In particular, females were more sensitive to probabilistic punishment than males. These data demonstrate that flexible encoding of risky actions by mPFC is involved in probabilistic punishment learning and provide a novel behavioral approach for studying the pathogenesis of anxiety and impulsivity with inclusion of sex as a biological variable.SIGNIFICANCE STATEMENTActions we choose to execute toward obtaining a reward are often associated with the probability of harm occurring. Impaired learning of this probability may be critical for the pathogenesis of anxiety or reckless behavior and impulsivity. We developed a behavioral procedure to assess this mode of learning. This procedure allowed us to determine biological and environmental factors that influence the resistance of reward seeking to probabilistic punishment and to identify the medial prefrontal cortex as a region that flexibly adapts its response to risky actions as contingencies are learned.