ArticlePDF Available

Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards

Authors:

Abstract and Figures

The dopamine system is thought to be involved in making decisions about reward. Here we recorded from the ventral tegmental area in rats learning to choose between differently delayed and sized rewards. As expected, the activity of many putative dopamine neurons reflected reward prediction errors, changing when the value of the reward increased or decreased unexpectedly. During learning, neural responses to reward in these neurons waned and responses to cues that predicted reward emerged. Notably, this cue-evoked activity varied with size and delay. Moreover, when rats were given a choice between two differently valued outcomes, the activity of the neurons initially reflected the more valuable option, even when it was not subsequently selected.
Activity during reward reflects prediction errors in a subpopulation of cue-responsive dopamine neurons.(a) Left, example of error signaling (negative prediction error) when an expected reward was omitted during the transition from a 'big' to a 'small' block (gray arrow). Right, example of error signaling (positive prediction error) from the neuron shown in a when reward was instituted during the transition from a 'small' to a 'big' block (first black arrow). For this neuron, an additional third bolus was delivered several trials later (second black arrow) to further illustrate prediction error encoding. Activity is aligned to the onset of the first unexpected reward. Raster display includes free- and forced-choice trials. Consistent with encoding of errors, activity changes were transient, diminishing as the rats learned to expect (or not expect) reward at that time during each trial block. These effects are quantified during reward omission (left) and reward delivery (right) for dopamine neurons that were cue- and reward-responsive (b; n = 19) and dopamine neurons that were cue-responsive only (c; n = 14) by comparing the average firing rate of each neuron during the 500 ms after an expected reward was omitted (left) or an unexpected reward was instituted (right) in the first five versus the last fifteen trials in the appropriate trial blocks (see text). Black dots represent neurons in which the difference in firing was statistically significant (t-test; P < 0.05). P-values in scatter plots indicate results of chi-square tests comparing the number of neurons above and below the diagonal in each plot. Bar graphs represent average firing rates for each population. Asterisks indicate planned comparisons revealing statistically significant differences (t-test, P < 0.05). Error bars, s.e.m.
… 
Content may be subject to copyright.
Dopamine neurons encode the better option in rats deciding
between differently delayed or sized rewards
Matthew R Roesch
1,4
, Donna J Calu
2,4
, and Geoffrey Schoenbaum
1,3
1Department of Anatomy and Neurobiology, University of Maryland School of Medicine, 20 Penn Street,
HSF-2 S251, Baltimore, Maryland 21201, USA
2Program in Neuroscience, University of Maryland School of Medicine, 20 Penn Street, HSF-2 S251,
Baltimore, Maryland 21201, USA
3Department of Psychiatry, University of Maryland School of Medicine, 20 Penn Street, HSF-2 S251,
Baltimore, Maryland 21201, USA
Abstract
The dopamine system is thought to be involved in making decisions about reward. Here we recorded
from the ventral tegmental area in rats learning to choose between differently delayed and sized
rewards. As expected, the activity of many putative dopamine neurons reflected reward prediction
errors, changing when the value of the reward increased or decreased unexpectedly. During learning,
neural responses to reward in these neurons waned and responses to cues that predicted reward
emerged. Notably, this cue-evoked activity varied with size and delay. Moreover, when rats were
given a choice between two differently valued outcomes, the activity of the neurons initially reflected
the more valuable option, even when it was not subsequently selected.
Dopamine is thought to be essential for reinforcement learning
1–4
. This hypothesis is rooted
in single-unit studies, which show that the firing patterns of dopamine neurons accord well
with a signal that reports errors in reward prediction
5–15
. This is true of firing for rewards as
well as for cues that come to predict rewards. This signal is thought to regulate learning and
to guide decision-making; however, few studies have investigated this issue in behavioral tasks
that involve active decision-making
11,14
.
To address this issue, we recorded single-unit activity in the ventral tegmental area (VTA) of
rats performing a variant of a time-discounting task. It is known that animals prefer immediate
over delayed reward (time discounting)
16–25
, but the role of dopamine in this impulsive
behavior remains unclear
26–30
. We designed a task to isolate changes in neural activity related
to reward size from changes in neural activity related to time to reward by independently
varying when (after a short or long delay) or what (big or small) reward was to be delivered at
the end of the trial. As the rats learned to respond to the more valuable odor cue, many dopamine
neurons developed cue-selective activity during odor sampling, firing more strongly for cues
that predicted either the more immediate reward or the larger one. Notably, these same neurons
exhibited activity that was consistent with prediction errors during unexpected delivery or
omission of reward. When the rats were given the opportunity to choose between two
Correspondence should be addressed to M.R.R. (mroes001@umaryland.edu).
4
These authors contributed equally to this work.
AUTHOR CONTRIBUTIONS M.R.R., D.J.C. and G.S. conceived the experiments. M.R.R. and D.J.C. carried out the recording work
and assisted with electrode construction, surgeries and histology. The data were analyzed by M.R.R. and G.S., who also wrote the
manuscript with assistance from D.J.C.
Note: Supplementary information is available on the Nature Neuroscience website.
NIH Public Access
Author Manuscript
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
Published in final edited form as:
Nat Neurosci. 2007 December ; 10(12): 1615–1624. doi:10.1038/nn2013.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
differently valued rewards, the cue-evoked activity in the dopamine neurons initially reflected
the value of the best available option. This was true even when that option was not subsequently
selected. Only after cue-offset (after the decision was made) did this activity change to reflect
the value of the option that would eventually be chosen.
RESULTS
Neurons were recorded in a delay discounting task
31
(Fig. 1a,b). On each trial, rats responded
to one of two adjacent wells after sampling an odor at a central port (Fig. 1c). Rats were trained
to respond to three odor cues: one odor that signaled reward in the right well (forced choice),
a second odor that signaled reward in the left well (forced choice), and a third odor that signaled
reward at either well (free choice). Across blocks of trials in each recording session, we
manipulated either the length of the delay that preceded reward delivery (Fig. 1a; blocks 1,2)
or the size of the reward (Fig. 1b; blocks 3,4). All trials were normalized to be the same duration
by adjusting the intertrial interval. As shown in Figure 1d–f, the rats changed their behavior
on both free and forced-choice trials across these training blocks, choosing the higher value
reward more often on free-choice trials (Fig. 1e,f) and with greater accuracy and shorter latency
on forced-choice trials (Fig. 1d). Thus the rats perceived the differently delayed and sized
rewards as having different values and could rapidly learn to change their behavior within each
trial block.
We recorded 258 neurons from the VTA and substantia nigra pars compacta (SN) in nine rats
during these sessions (Fig. 2a). Waveforms from these neurons (Fig. 2b) were analyzed for
features characteristic of dopamine neurons
12–14,32,33
(Fig. 2c). This analysis identified 36
neurons that met established electrophysiological criteria for dopamine neurons (Fig. 2c),
including 2 in the SN (see Supplementary Data online for analyses of SN cells) and 34 in the
VTA. To confirm the validity of the waveform criteria, we recorded an additional 18 neurons
before and after intravenous infusion of the dopamine agonist apomorphine, which inhibits the
activity of dopamine neurons
13,34,35
. Neurons whose firing was suppressed in response to
apomorphine clustered with the 36 putative dopamine neurons that were identified by the
waveform analysis (Fig. 2c).
Prediction error signaling in cue-responsive neurons
The firing activity in 33 of the 34 dopaminergic neurons in the VTA was significantly
modulated during odor sampling (compared with baseline; t-test; P < 0.05). Consistent with
previous reports
2,5–13
, this population included many neurons (n = 19) that increased firing
in response to reward. Notably, activity in the 19 dopamine neurons that responded to either
cue or reward (cue/reward-responsive dopamine neurons) reflected the positive and negative
reward prediction errors that are inherent in our task design, whereas activity in the remaining
14 cue-responsive neurons did not. This was most apparent in the transition between blocks 3
and 4 (Fig. 1b); at this transition, the reward in one well is made larger, by delivery of several
unexpected boluses of sucrose, and the reward in the other well is made smaller, by omission
of several expected sucrose boluses. Many of the cue/reward-responsive dopamine neurons
showed suppressed firing when expected boluses were omitted and increased firing when
unexpected boluses were delivered. These shifts in firing were maximal immediately after the
transition and diminished with learning. These effects are illustrated by the single-unit example
shown in Figure 3a.
Signaling of prediction errors can also be appreciated in the heat plot in Figure 4b, which shows
the population response of the 19 cue/reward-responsive dopamine neurons on the first and
last 10 forced-choice trials in each direction, for each of the four training blocks shown in
Figure 1a. Positive prediction errors occurred at the beginning of every trial block in which the
reward at a particular well increased in value. This occurred at the transition from ‘small’ to
Roesch et al. Page 2
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
‘big’ (Fig. 4b, 4
bg
, white arrow) and at transitions from ‘long’ to ‘short’ (Fig. 4b, 2
sh
, white
arrow) and from ‘long’ to ‘big’ (Fig. 4b, 3
bg
, white arrow). In each case, neural activity
increased significantly during the initial trials of the new block, when reward was delivered
unexpectedly, and then declined, as the rats learned to expect the reward. The decline in activity
across these trial blocks is illustrated for the individual neurons in Fig. 3b (‘reward delivery’),
which plots the average firing in each neuron during the 500 ms after reward delivery in the
first five and the last fifteen trials of these training blocks (2
sh
, 4
bg
, 3
bg
in Fig. 4b). These
neurons were significantly more likely to fire more in early trials than in late trials (Fig. 3b;
chi-square, P < 0.05), and neurons with a statistically significant decline in activity
outnumbered those that showed an increase (Fig. 3b; chi-square, P < 0.0017). Furthermore,
activity averaged across the entire population declined significantly within these trial blocks
(Fig. 3b; t-test, P < 0.007).
Negative prediction errors are also seen in Figure 4b, at the beginning of trial blocks in which
an expected reward was omitted. This occurred at the transition from ‘big’ to ‘small’ (Fig. 4b,
4
sm
, gray arrow) and at the transition from ‘short’ to ‘long’ at the time when the short reward
would have been delivered (see Supplementary Data online). In each case, neural activity
declined significantly at the time of reward omission (for example, at the time when the reward
would have been delivered on ‘short’ trials or at the time when the second bolus would have
been delivered on ‘big’ trials), and this decline was greatest in the initial trials and then lessened
through the remaining trials in each block, as the rats learned to expect reward omission. This
is illustrated for individual neurons in Figure 3b (‘reward omission’), which plots the average
firing for each neuron during the 500-ms period after reward omission in the first five and last
fifteen trials of these training blocks (2
lo
, 4
sm
in Fig. 4b). These neurons were significantly
more likely to fire more in late trials than in early trials (Fig. 3b; chi-square, P < 0.001), and
neurons with a statistically significant increase in activity outnumbered those that showed a
decrease (Fig. 3b; chi-square, P < 0.02). Furthermore, activity averaged across the entire
population increased significantly within these trial blocks (Fig. 3b; t-test, P < 0.009).
By contrast, the 14 cue-responsive dopamine neurons that did not respond to reward showed
no evidence of prediction error signaling. This is illustrated in Figure 3c, which plots the
average firing in response to reward in these neurons early and late in training blocks in which
the reward at a particular well increased (Fig. 3c, ‘reward delivery’) or decreased (Fig. 3c,
‘reward omission’) in value unexpectedly. Unlike cue/reward-responsive neurons, these
putative dopamine neurons showed no change in firing in response to changes in expected
reward. Indeed, signals related to prediction errors were not observed in any of the other
populations identified in Figure 2c (see Supplementary Fig. 1–Supplementary Fig. 3 online).
Effect of delay and reward size on cue-evoked activity
The reward- and nonreward-responsive dopamine neurons also differed in what information
they encoded during cue-sampling. Consistent with their role in signaling prediction errors,
the cue/reward-responsive dopamine neurons fired on the basis of the value of the reward that
a cue predicted. In each trial block, learning was associated with increased firing in response
to the cue that predicted the more valuable reward and decreased firing in response to the cue
that predicted the less valuable reward. This is evident in the single-unit example shown in
Figure 4a, and in the population response shown in Figure 4b.
Notably, increased firing in response to the cue that predicted the more valuable reward with
learning was seen in both ‘size’ and ‘delay’ blocks (Fig. 4c,d). The correlation between size
and delay is further illustrated in Figure 5a, which plots the difference in cue-evoked activity
between high and low value trials for each cue/reward-responsive neuron in the first five and
last fifteen trials in each block. This difference score was calculated separately for size (big
minus small) and delay (short minus long) blocks. During the early trials in each block, these
Roesch et al. Page 3
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
scores were seldom different from zero (Fig. 5a), indicating that the value of the reward that
the cues predicted caused little or no difference in firing; during the late trials in each trial
block, these scores were typically above zero (Fig. 5a), indicating greater firing in response to
the cue that predicted the more valuable reward. Of the 19 cue/reward-responsive neurons, 9
(47%) showed significant enhancement of firing (t-test, P < 0.05) under short and big
conditions (black dots) as compared with long and small conditions, respectively. Of the
remaining neurons, one neuron fired significantly more strongly for short alone (compared to
long; t-test, P < 0.05; blue dots) and three showed significant enhancement under big conditions
alone (compared to small; t-test, P < 0.05; green dots). None exhibited a significant preference
for less valuable rewards (long > short or small > big; t-test, P > 0.05). Furthermore, difference
scores from blocks in which value differed owing to delay and reward size were highly
correlated (Fig. 5a). Notably, the changes in cue-evoked firing were proportional to the delay
and were not related to the rate of titration on delay trials (Supplementary Data).
By contrast, the 14 cue-responsive neurons that did not respond to reward showed no evidence
of value encoding during cue sampling. This is evident in Figure 5c, which shows that not a
single one of these neurons fired differently depending on the value of the predicted reward.
Notably, cue-selective activity in the cue/reward-responsive neurons reflected the relative
rather than absolute value of the rewards that were available in a given block. This is most
apparent in a comparison of firing in the ‘short’ reward (2
sh
) and ‘small’ reward (3
sm
)
conditions. Even though the same odor was presented in both blocks, and it predicted the same
amount of reward, these neurons fired more to the cue when it predicted a ‘short’ reward (n =
13; 68%) than when it predicted a ‘small’ reward (n = 0) (Fig. 5b; chi-square, P < 0.001). That
is, although the absolute value of the reward in these two blocks of trials was the same (the
‘short’ and ‘small’ rewards were identical), the neurons responded more to this cue when it
predicted immediate reward versus delayed reward than when it predicted a small versus a big
reward. This effect was not found in the 14 cue-responsive dopamine neurons that did not
respond to reward (Fig. 5d; chi-square, P = 0.593).
Finally, as expected from previous reports, activity in the cue/reward-responsive dopamine
neurons did not depend on cue identity or response direction. To illustrate this, we computed
for each neuron the value index (high low/high + low) during odor sampling for each
direction. The value selectivity was strongly correlated across direction (r
2
= 0.598; P < 0.001),
indicating that dopamine neurons encoded value for responses made in both directions
(Supplementary Fig. 4 online).
Effect of available options on cue-evoked activity
Few studies have investigated the activity of dopamine neurons in the context of decision-
making. Popular computational theories propose that dopamine neurons should encode the
value of the best available option (Q-learning) to promote optimal learning, even when less
valuable options are subsequently selected
36
. Other ‘Actor-Critic’ models suggest that
dopamine neurons should report the value of the cue averaged over the available options.
However, at least one recent study has presented evidence contrary to both these ideas, showing
that cue-evoked activity in dopamine neurons directly reflects the value of the option that will
subsequently be chosen, a result that supports learning of Q-values by a SARSA (state-action-
reward-state-action) type of learning algorithm rather than the more conventional Q-learning
or Actor-Critic algorithms
14
.
To determine whether signaling of cue value by dopamine neurons in our study also depended
on the value of the option selected we compared cue-evoked activity in the cue/reward-
responsive neurons on free-choice versus forced-choice trials. Free-choice trials were similar
to the forced-choice trials in that the value of the rewards available at the two wells was the
Roesch et al. Page 4
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
same; the only difference was that both rewards were simultaneously available, so the rat had
the opportunity to exploit its previous knowledge and select the more valuable reward or
explore the less valuable alternative. The availability of both rewards was signaled by
presentation of a third ‘free-choice’ odor cue. This cue was presented throughout training on
7/20 trials, so the rats had the same exposure to this cue as to the other two cues and had the
same opportunity to learn about the rewards predicted by this cue over the course of each trial
block.
To control for learning, we analyzed data from trials after behavior reflected the contingencies
in the current block (>50% choice of more valuable option), so that our analysis would not be
contaminated by responses based on the contingencies from the preceding trial block.
Furthermore, to control for the possibility that low-value choices might still be more frequent
early during this block of trials, we paired each free-choice trial with the immediately preceding
and following forced-choice trial of the same value. The average population responses on these
trials, collapsed across direction, are shown separately for size and delay blocks in Figure 6.
Consistent with the results already described (Fig. 4 and Fig. 5), cue-evoked activity was higher
on the forced-choice trials when the cue predicted the high-value reward than when it predicted
the low-value reward (Fig. 6a,c). However, on free-choice trials, this difference did not exist
(Fig. 6b,d) during cue sampling (gray bar; 500 ms). Instead, cue-evoked activity on free-choice
trials was the same as that on the high-value forced-choice trials, regardless of whether the rat
ultimately chose the high-value reward or the low-value reward.
Statistical comparisons of activity during cue sampling are shown in Figure 6. Activity was
significantly stronger on free-choice trials when the low-value reward was selected than on
forced-choice trials that ended in presentation of the same low-value reward (t-test, P’s <
0.007). Furthermore, cue-evoked activity on free-choice, low-value trials was not significantly
different from cue-evoked activity on forced-choice trials that ended in presentation of the
high-value reward (t-test, Ps > 0.1). Thus, during cue sampling on free-choice trials, dopamine
neurons signaled the value of the best available option, even when it was not ultimately selected.
We found similar effects when we compared cue-evoked activity on correct and incorrect
forced-choice trials (Supplementary Fig. 5 online).
Of course, activity in these neurons did ultimately change to reflect the value of the reward
that was chosen on low-value free-choice trials. This transition occurred within the first 100
ms after odor offset for both delay and size blocks (t-test, P < 0.05). This time period
immediately precedes movement from the odor port to the fluid well, indicating that the activity
of these neurons might change to reflect the unfolding of the decision process on free-choice
trials.
The simplest explanation for the unique profile of activity on low-value free-choice trials is
that the dopamine neurons initially responded to the value of the best available reward,
regardless of whether this outcome was subsequently selected, and then changed to reflect the
value of the chosen option, once that decision had been made. However, several other possible
explanations for the difference in neural activity between free- and forced-choice trials must
be considered. One possible explanation is that rats might have taken longer to decide where
to respond on free-choice trials than on forced-choice trials, thereby altering the time course
of the dopamine signal on free-choice trials. A second possible explanation is that the free-
choice odor elicited a stronger dopaminergic response—even on trials when the rat selected
the low-value reward—because it typically led to acquisition of the better outcome, and
therefore, was more motivating.
To test whether these factors contributed to our results, we computed the reaction time for each
trial from the time of odor offset to the animal’s decision to exit the odor port. This measure
Roesch et al. Page 5
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
would be influenced by a number of variables relevant to the two possible explanations
described above, including the difficulty of the decision and the motivation level of the rat. For
example, if free-choice trials were simply more difficult or required more processing time, then
reaction times on free-choice trials should be longer. Similarly if the free-choice odor was more
motivating, then reaction times on free-choice trials should be shorter.
Contrary to both of these predictions, we found that the rats showed similar reaction times on
free- and forced- choice trials of the same value (Fig. 6). A two-factor ANOVA (value × trial
type) revealed a main effect of value (F
1,18
= 14.6, P < 0.0003) but no effect nor any interaction
with trial type (F’s < 0.15, P’s > 0.7). Subsequent contrast testing showed significant
differences between differently valued trials, regardless of trial type (P’s < 0.017), whereas
there were no significant differences in any comparisons between similarly valued trials (P’s
> 0.675). Thus, the high firing rates on low-value, free-choice trials did not seem to reflect
higher motivation or better learning for the odor cue that signaled these trials, nor did the
difference in activity on free-versus forced-choice trials reflect any difference in the time course
of responding on these two trial types. Notably, the difference in reaction times on high-value
and low-value free-choice trials also indicates that the rats knew the likely outcomes of their
responses, even when they chose the less valuable reward.
DISCUSSION
We monitored the activity of dopamine neurons in rats performing a time-discounting task in
which we independently manipulated the timing and size of rewards. Delay length and reward
size were counter-balanced across spatial location and cue identity within each session, and
the rats changed their behavior as we manipulated these variables, reliably showing their
preference for big over small and immediate over delayed rewards.
As described in primates, dopamine neurons fired more strongly for cues that predicted larger
rewards
11,32
. Here we replicated those findings in rats and also showed that the same
dopamine neurons encode the relative value of an immediate versus delayed reward. This
common representation is interesting because choice of the big over the small reward led to
more sucrose over the course of the recording, whereas selection of the immediate over the
delayed reward did not. Thus, encoding of prediction errors in VTA dopamine neurons reflects
the subjective or relative value of rewards rather than their actual value. This is consistent with
studies in other settings in primates
32
.
Generally, changes in firing in response to delayed rewards might reflect any of a number of
variables. For example, delayed rewards might be discounted because of the cost of lost
opportunities or because of the uncertainty of the future reward. Although the delayed reward
was always delivered in our task, future rewards are thought to be inherently uncertain. In
addition, the timing of the delayed reward was titrated to discourage but not eliminate
responding. This was necessary so that we could compare neural activity at the two wells on
free-choice trials in each block. However, titrating the delay caused the timing of delayed
reward to be less consistent than that of the immediate reward, particularly at the beginning of
each delay block, when the time to reward on one side increased rapidly to its new value.
Comparison of cue-evoked activity on trials of different delays showed that activity was
inversely related to delay length and was not related to the frequency of titration
(Supplementary Fig. 6). Thus it seems unlikely that the influence of delayed reward on neural
activity observed here is an artifact of titration per se, although changes in activity might still
reflect the more general uncertainty that is inherent in delayed rewards. Whatever the
underlying cause, the reduced signaling by dopamine neurons would presumably result in
weaker learning for cues that predicted delayed reward over time, leading to apparently
impulsive responding for immediate reward.
Roesch et al. Page 6
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
We also investigated neural activity on trials in which rats could choose to respond for either
the high-value or the low-value reward. Reinforcement learning models suggest a number of
different ways in which dopamine neurons could represent value in this situation: (i) dopamine
neurons could report the value of the option that the animal is going to select (Q value achieved
by SARSA learning), (ii) dopamine neurons could represent the average value of all available
options (V-learning) or (iii) dopamine neurons could report the value of the best possible option
independent of which is ultimately selected (Q-learning). It is unclear which of these
predictions is supported by neural data.
Our results are clearly inconsistent with the first alternative, known as Q-value or SARSA,
which predicts that cue-evoked activity on free-choice and forced-choice trials should be
similar for trials that end in selection of the same reward. Though true for high-value trials,
this was not true for the low-value trials. On low-value trials, activity was much higher on free-
choice than on forced-choice trials. Indeed, activity on low-value, free-choice trials did not
differ statistically from activity on high-value, forced-choice trials.
Our results could be viewed as consistent with the second alternative, known as V-learning,
as this model predicts that cue-evoked activity on free-choice trials should be the same,
regardless of which reward is ultimately selected. In accord with the predictions of this model,
we found that activity was the same on free-choice trials, regardless of subsequent behavior.
However this model also indicates that activity should reflect the average ‘Pavlovian’ values
of the cues. In our task, the rats received the high-value reward on more than 99% of the high-
value, forced-choice trials. By contrast, they selected the high-value reward on only ~ 70% of
the free-choice trials, opting for the low-value reward on the other ~ 30% of the trials. Thus
the ‘Pavlovian’ value of the free-choice cue, averaged proportionally across the available
rewards, should have been lower than that of the high-value forced-choice cue. Yet, contrary
to the predictions of V-learning, cue-evoked activity on these trial types did not differ. Of
course, this conclusion depends on the assumption that the activity of dopamine neurons has
not saturated at a level corresponding to roughly 70% of the value of the high-value, forced
choice cue. Nevertheless although we cannot definitively eliminate this interpretation, it seems
to us that our data are not fully consistent with the predictions of V-learning.
Instead, our results seem to be most consistent with the third alternative, known as Q-learning,
which predicts that cue-evoked activity on free-choice trials will reflect the value of the best
available option. By allowing error signals to reflect the best available option, this model
dissociates error signaling from subsequent actions. As a result, learning for antecedent events
is not penalized when animals choose to explore less valuable alternatives. Such exploratory
behavior would allow animals to recognize when existing knowledge needs to be updated to
reflect changing conditions.
Notably, whether our data are interpreted as supporting V-learning or Q-learning, our results
are at odds with a recent report in monkeys
14
, which did not support these two models. In that
study, monkeys were trained to associate different cues displayed on a computer monitor with
different probabilities of reward. On some trials, only one cue was presented and the monkey
had to select it to obtain reward, as in our forced-choice trials. On other trials, two cues were
presented together and the monkey could select either cue to obtain its associated reward, as
in our free-choice trials. The authors reported that cue-evoked activity on free-choice trials
always reflected the value of the cue that was ultimately selected. These results were interpreted
as showing that the activity of dopamine neurons complied with the predictions of SARSA
learning.
Several procedural differences between our study and this recent report might account for the
divergent findings discussed above. First, it is possible that dopamine signaling is different in
Roesch et al. Page 7
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
rats and in monkeys. However, our results and those of at least one other lab
13
are fully
consistent with the proposal that dopamine neurons encode prediction errors in rats as in
primates. A second possibility is that the use of a unique odor cue on free-choice trials in the
current study biased toward signaling of the better option, as it required the rat to attend to only
a single item. The simultaneous presentation of two separate reward-predictive cues in the
previous report might have allowed the monkey to attend preferentially to the cue that was to
drive subsequent responding. A third possibility is the difference in training. Monkeys are often
highly over-trained, completing thousands of trials each day for many months; by comparison,
the rats in our study were only lightly trained by recording standards, completing perhaps 5,000
trials over the entire experiment. In addition, the rats had to update their existing knowledge
several times within each session to reflect changing reward contingencies. These conditions
presumably put a premium on exploratory behavior that might not have been present for the
monkeys in the delayed choice task; it might be particularly important to decouple error
signaling from subsequent actions in this context.
However, another far more intriguing possibility is that the divergent findings reflect a
legitimate difference between the encoding properties of dopamine neurons in SN and those
in the VTA. These areas project to very different neural targets
37,38
. Neurons in the SN project
preferentially to the dorsal striatum and other areas that have been implicated in habitual and
instrumental learning, in which animals learn about responses that predict reward
39–41
. By
contrast, neurons in the VTA project preferentially to the ventral striatum and limbic areas that
have been implicated in learning what cues predict reward
42–46
. This difference parallels the
different firing properties of dopamine neurons in these two circuits — firing on free-choice
trials in rat VTA neurons, reported here, signals the potential value of the cue, irrespective of
the response selected, whereas firing on free-choice trials in monkey SN neurons signals the
value of the response
14
. The unique properties of these circuits for encoding instrumental
versus Pavlovian associations might reflect, in part, the different teaching signals received from
midbrain dopamine neurons
3
.
Of course, activity on free-choice trials did eventually change to reflect the value of chosen
reward. This transition occurred after cue sampling, just as the animal exited the odor port and
selected a well, thereby tracking the unfolding decision process. Indeed, viewed from another
perspective, the dopamine neurons are signaling the best available option both during cue
sampling and also thereafter. On high-value, free-choice trials the value of the best option
remains unchanged by the decision, whereas on low-value, free-choice trials, the value of the
best available option is higher before than after the decision. Thus, after the decision is made,
the best option available on free-choice trials is identical to that on forced-choice trials. Neural
activity in dopamine neurons in the rat VTA closely tracks this change. Consistent with
speculation in
ref. 14
, this indicates that dopamine neurons are early recipients of information
concerning the intention to take a particular action.
METHODS
Subjects
Male Long-Evans rats were obtained at 175–200g from Charles River Labs. Rats were tested
at the University of Maryland School of Medicine in accordance with its guidelines and those
of the US National Institutes of Health.
Surgical procedures and histology
Surgical procedures followed guidelines for aseptic technique. Electrodes were manufactured
and implanted as in prior recording experiments
31
. All rats had a drivable bundle of 10 25-µm
diameter FeNiCr wires (Stablohm 675, California Fine Wire) chronically implanted dorsal to
Roesch et al. Page 8
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
the VTA in the left hemisphere at 5.2 mm posterior to bregma, 0.7 mm laterally, and 7.0 mm
ventral to the brain surface. Wires were cut with surgical scissors to extend ~ 1 mm beyond
the cannula and electroplated with platinum (H
2
PtCl
6
, Aldrich) to an impedance of ~300 k.
Cephalexin (15 mg kg
1
, post-operative) was administered twice daily for two weeks post-
operatively.
For experiments involving apomorphine infusions, sterilized silastic catheters (Dow Corning)
were also implanted using published procedures
47
. An incision was made lateral to the midline
to expose the jugular vein. The catheter was inserted into the jugular vein and secured using
silk sutures. The catheter then passed subcutaneously to the top of the skull where it was
connected to the 22-gauge cannula (Plastics One) head mount, which was anchored on the skull
using screws and grip cement. Buprenorphine (0.1 mg kg
1
, subcutaneous) was administered
post-operatively, and the catheters were flushed every 24–48 h with an antibiotic gentamicin/
saline solution (0.1 ml at 0.08 mg per 50 ml).
The final electrode position was marked by passing a 15-µA current through each electrode.
The rats were then perfused, and their brains removed and processed for histology
31
.
Dopamine cell identification
Neurons were screened for wide waveform and amplitude characteristics, and then tested with
a non-specific dopamine agonist, apomorphine (0.60–1.0 mg kg
1
, intravenous). The
apomorphine test consisted of ~30 min of baseline recording, apomorphine infusion, and ~ 30
min post-infusion recording.
Time-discounting choice task
Recording was conducted in aluminum chambers approximately 18 inch on each side, with
sloping walls narrowing to an area of 12 inch by 12 inch at the bottom. A central odor port was
located above the two fluid wells. Two lights were located above the panel. The odor port was
connected to an air flow dilution olfactometer to allow the rapid delivery of olfactory cues.
Odors were chosen from compounds obtained from International Flavors and Fragrances.
Trials were signaled by illumination of the panel lights inside the box. When these lights were
on, a nosepoke into the odor port resulted in delivery of the odor cue to a small hemicylinder
behind this opening. One of three odors was delivered to the port on each trial, in a
pseudorandom order. At odor offset, the rat had 3 s to make a response at one of the two fluid
wells located below the port. One odor instructed the rat to go to the left to get a reward, a
second odor instructed the rat to go to the right to get a reward, and a third odor indicated that
the rat could obtain a reward at either well. Odors were presented in a pseudorandom sequence
so that the free-choice odor was presented on 7/20 trials and the left/right odors were presented
in equal numbers. In addition, the same odor could be presented on no more than three
consecutive trials.
Once the rats were trained to perform this basic task, we introduced blocks in which we
independently manipulated the size of the reward and the delay preceding reward delivery. For
recording, one well was randomly designated as short and the other long at the start of the
session (Fig. 1a, block 1). In the second block of trials these contingencies were switched (Fig.
1a, block 2). The length of the delay under long conditions abided by the following algorithm.
The side designated as long started off as 1 s and increased by 1 s every time that side was
chosen until it became 3 s. If the rat continued to choose that side, the length of the delay
increased by 1 s up to a maximum of 7 s. If the rat chose the side designated as long on fewer
than 8 out of the previous 10 choice trials then the delay was reduced by 1 s to a minimum of
3 s. The reward delay for long forced-choice trials was yoked to the delay in free-choice trials
Roesch et al. Page 9
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
during these blocks. In later blocks we held the delay preceding reward constant while
manipulating the size of reward (Fig. 1b). The reward was a 0.05-ml bolus of 10% sucrose
solution. The reward magnitude used in delay blocks was the same as the reward used in the
reward blocks. For big reward, an additional bolus was delivered after 500 ms. The additional
bolus after a delay was used to make it extremely apparent that the reward got bigger.
Occasionally a third bolus was added to further demonstrate positive prediction errors.
Single-unit recording
Wires were screened for activity daily; if no activity was detected, the rat was removed, and
the electrode assembly was advanced by 40 or 80 µm. Otherwise, active wires were selected
to be recorded, a session was conducted, and the electrode was advanced at the end of the
session. Neural activity was recorded using Plexon Multichannel Acquisition Processor
systems. Signals from the electrode wires were amplified 20× by an op-amp headstage (Plexon
Inc, HST/8o50-G20-GR), located on the electrode array. Immediately outside the training
chamber, the signals were passed through a differential pre-amplifier (Plexon Inc, PBX2/16sp-
r-G50/16fp-G50), where the single unit signals were amplified 50× and filtered at 150—9,000
Hz. The single unit signals were then sent to the Multichannel Acquisition Processor box,
where they were further filtered at 250—8,000 Hz, digitized at 40 kHz and amplified at 1–32×.
Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded to disk
by an associated workstation
Data analysis
Units were sorted using Offline Sorter software from Plexon Inc. Sorted files were then
processed and analyzed in Neuroexplorer and Matlab. To examine activity related to reward
delivery or omission, we studied activity 500 ms after reward delivery or omission. We chose
500 ms because no other trial event occurred until at least 500 ms after reward delivery (or
omission). Initial analysis of cue-related activity was confined to activity starting after odor
onset and ending at the odor port exit. Later analysis of cue related activity examined activity
during the 500 ms while the odor was on. Analysis of free-choice versus forced-choice trials
included trials on which the more valuable option was chosen more than 50% of the time.
Furthermore, we paired each free-choice trial with the immediately preceding and following
forced-choice trial of the same value. This procedure allowed us to control for the fact that
low-value choices might be more frequent early in a block of trials. We used t-tests to measure
difference between trial types (P < 0.05). Additionally, Pearson Chi-square tests (P < 0.05)
were used to compare the proportions of neurons.
Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
ACKNOWLEDGMENTS
We thank Y. Niv, P. Shepard, G. Morris and W. Schultz for thoughtful comments on this manuscript, and S.
Warrenburg at International Flavors and Fragrances for his assistance in obtaining odor compounds. This work was
supported by grants from the US National Institute on Drug Abuse (R01-DA015718, G.S.; K01-DA021609, M.R.R.),
the National Institute of Mental Health (F31-MH080514, D.J.C.), the National Institute on Aging (R01-AG027097,
G.S.) and the National Institute of Neurological Disorders and Stroke (T32-NS07375, M.R.R.).
References
1. Wise RA. Dopamine, learning and motivation. Nat. Rev. Neurosci 2004;5:483–494. [PubMed:
15152198]
2. Schultz W. Getting formal with dopamine and reward. Neuron 2002;36:241–263. [PubMed: 12383780]
Roesch et al. Page 10
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
3. Dayan P, Balleine BW. Reward, motivation and reinforcement learning. Neuron 2002;36:285–298.
[PubMed: 12383782]
4. Day JJ, Roitman MF, Wightman RM, Carelli RM. Associative learning mediates dynamic shifts in
dopamine signaling in the nucleus accumbens. Nat. Neurosci 2007;10:1020–1028. [PubMed:
17603481]
5. Mirenowicz J, Schultz W. Importance of unpredictability for reward responses in primate dopamine
neurons. J. Neurophysiol 1994;72:1024–1027. [PubMed: 7983508]
6. Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine
neurons. Science 2003;299:1898–1902. [PubMed: 12649484]
7. Tobler PN, Dickinson A, Schultz W. Coding of predicted reward omission by dopamine neurons in a
conditioned inhibition paradigm. J. Neurosci 2003;23:10402–10410. [PubMed: 14614099]
8. Hollerman JR, Schultz W. Dopamine neurons report an error in the temporal prediction of reward
during learning. Nat. Neurosci 1998;1:304–309. [PubMed: 10195164]
9. Waelti P, Dickinson A, Schultz W. Dopamine responses comply with basic assumptions of formal
learning theory. Nature 2001;412:43–48. [PubMed: 11452299]
10. Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on
predictive hebbian learning. J. Neurosci 1996;16:1936–1947. [PubMed: 8774460]
11. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error
signal. Neuron 2005;47:129–141. [PubMed: 15996553]
12. Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O. Dopamine neurons can represent context-
dependent prediction error. Neuron 2004;41:269–280. [PubMed: 14741107]
13. Pan WX, Schmidt R, Wickens JR, Hyland BI. Dopamine cells respond to predicted events during
classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci
2005;25:6235–6242. [PubMed: 15987953]
14. Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H. Midbrain dopamine neurons encode decisions
for future action. Nat. Neurosci 2006;9:1057–1063. [PubMed: 16862149]
15. Kawagoe R, Takikawa Y, Hikosaka O. Reward-predicting activity of dopamine and caudate neurons
—a possible mechanism of motivational control of saccadic eye movement. J. Neurophysiol
2004;91:1013–1024. [PubMed: 14523067]
16. Cardinal RN, Pennicott DR, Sugathapala CL, Robbins TW, Everitt BJ. Impulsive choice induced in
rats by lesions of the nucleus accumbens core. Science 2001;292:2499–2501. [PubMed: 11375482]
17. Evenden JL, Ryan CN. The pharmacology of impulsive behaviour in rats: the effects of drugs on
response choice with varying delays of reinforcement. Psychopharmacology (Berl.) 1996;128:161–
170. [PubMed: 8956377]
18. Herrnstein RJ. Relative and absolute strength of response as a function of frequency of reinforcement.
J. Exp. Anal. Behav 1961;4:267–272. [PubMed: 13713775]
19. Ho MY, Mobini S, Chiang TJ, Bradshaw CM, Szabadi E. Theory and method in the quantitative
analysis of “impulsive choice” behaviour: implications for psychopharmacology.
Psychopharmacology (Berl.) 1999;146:362–372. [PubMed: 10550487]
20. Mobini S, et al. Effects of lesions of the orbitofrontal cortex on sensitivity to delayed and probabilistic
reinforcement. Psychopharmacology (Berl.) 2002;160:290–298. [PubMed: 11889498]
21. Kahneman D, Tverskey A. Choices, values and frames. Am. Psychol 1984;39:341–350.
22. Kalenscher T, et al. Single units in the pigeon brain integrate reward amount and time-to-reward in
an impulsive choice task. Curr. Biol 2005;15:594–602. [PubMed: 15823531]
23. Lowenstein, GEJ. Choice Over Time. New York: Russel Sage Foundation; 1992.
24. Thaler R. Some empirical evidence on dynamic inconsistency. Econ. Lett 1981;8:201–207.
25. Winstanley CA, Theobald DE, Cardinal RN, Robbins TW. Contrasting roles of basolateral amygdala
and orbitofrontal cortex in impulsive choice. J. Neurosci 2004;24:4718–4722. [PubMed: 15152031]
26. Cardinal RN, Winstanley CA, Robbins TW, Everitt BJ. Limbic corticostriatal systems and delayed
reinforcement. Ann. NY Acad. Sci 2004;1021:33–50. [PubMed: 15251872]
27. Kheramin S, et al. Effects of orbital prefrontal cortex dopamine depletion on intertemporal choice: a
quantitative analysis. Psychopharmacology (Berl.) 2004;175:206–214. [PubMed: 14991223]
Roesch et al. Page 11
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
28. Wade TR, de Wit H, Richards JB. Effects of dopaminergic drugs on delayed reward as a measure of
impulsive behavior in rats. Psychopharmacology (Berl.) 2000;150:90–101. [PubMed: 10867981]
29. Cardinal RN, Robbins TW, Everitt BJ. The effects of d-amphetamine, chlordiazepoxide, alpha-
flupenthixol and behavioural manipulations on choice of signalled and unsignalled delayed
reinforcement in rats. Psychopharmacology (Berl.) 2000;152:362–375. [PubMed: 11140328]
30. Roesch MR, Takahashi Y, Gugsa N, Bissonette GB, Schoenbaum G. Previous cocaine exposure
makes rats hypersensitive to both delay and reward magnitude. J. Neurosci 2007;27:245–250.
[PubMed: 17202492]
31. Roesch MR, Taylor AR, Schoenbaum G. Encoding of time-discounted rewards in orbitofrontal cortex
is independent of value representation. Neuron 2006;51:509–520. [PubMed: 16908415]
32. Tobler PN, Fiorillo CD, Schultz W. Adaptive coding of reward value by dopamine neurons. Science
2005;307:1642–1645. [PubMed: 15761155]
33. Kiyatkin EA, Rebec GV. Heterogeneity of ventral tegmental area neurons: single-unit recording and
iontophoresis in awake, unrestrained rats. Neuroscience 1998;85:1285–1309. [PubMed: 9681963]
34. Bunney BS, Aghajanian GK, Roth RH. Comparison of effects of L-dopa, amphetamine and
apomorphine on firing rate of rat dopaminergic neurones. Nat. New Biol 1973;245:123–125.
[PubMed: 4518113]
35. Skirboll LR, Grace AA, Bunney BS. Dopamine auto- and postsynaptic receptors: electrophysiological
evidence for differential sensitivity to dopamine agonists. Science 1979;206:80–82. [PubMed:
482929]
36. Niv Y, Daw ND, Dayan P. Choice values. Nat. Neurosci 2006;9:987–988. [PubMed: 16871163]
37. Haber SN, Fudge JL, McFarland NR. Striatonigrostriatal pathways in primates form an ascending
spiral from the shell to the dorsolateral striatum. J. Neurosci 2000;20:2369–2382. [PubMed:
10704511]
38. Joel D, Weiner I. The connections of the dopaminergic system with the striatum in rats and primates:
an analysis with respect to the functional and compartmental organization of the striatum.
Neuroscience 2000;96:451–474. [PubMed: 10717427]
39. Yin HH, Knowlton BJ, Balleine BW. Lesions of dorsolateral striatum preserve outcome expectancy,
but disrupt habit formation in instrumental learning. Eur. J. Neurosci 2004;19:181–189. [PubMed:
14750976]
40. O’Doherty J, et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning.
Science 2004;304:452–454. [PubMed: 15087550]
41. Knowlton BJ, Mangels JA, Squire L. A neostriatal habit learning system in humans. Science
1996;273:1399–1402. [PubMed: 8703077]
42. Hatfield T, Han JS, Conley M, Gallagher M, Holland P. Neurotoxic lesions of basolateral, but not
central, amygdala interfere with Pavlovian second-order conditioning and reinforcer devaluation
effects. J. Neurosci 1996;16:5256–5265. [PubMed: 8756453]
43. Gallagher M, McMahan RW, Schoenbaum G. Orbitofrontal cortex and representation of incentive
value in associative learning. J. Neurosci 1999;19:6610–6614. [PubMed: 10414988]
44. Baxter MG, Parker A, Lindner CCC, Izquierdo AD, Murray EA. Control of response selection by
reinforcer value requires interaction of amygdala and orbitofrontal cortex. J. Neurosci 2000;20:4311–
4319. [PubMed: 10818166]
45. Gottfried JA, O’Doherty J, Dolan RJ. Encoding predictive reward value in human amygdala and
orbitofrontal cortex. Science 2003;301:1104–1107. [PubMed: 12934011]
46. Parkinson JA, Cardinal RN, Everitt BJ. Limbic cortical-ventral striatal systems underlying appetitive
conditioning. Prog. Brain Res 2000;126:263–285. [PubMed: 11105652]
47. Lu L, et al. Central amygdala ERK signaling pathway is critical to incubation of cocaine craving.
Nat. Neurosci 2005;8:212–219. [PubMed: 15657599]
Roesch et al. Page 12
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 1.
Choice task in which delay and size of reward were manipulated. (a,b) Sequence of events in
delay blocks (a) and reward blocks (b). At the start of each recording session one well was
randomly designated as short and the other as long (block 1). In the second block of trials these
contingencies were switched (block 2). In blocks 3 and 4, we held the delay constant while
manipulating the size of the reward. At least 60 trials were collected per block. (c) Picture of
apparatus used in task, showing odor port (~ 2.5 cm diameter) and the two fluid wells. (d) The
impact of delay length and reward size on behavior on forced-choice trials. Bar graphs show
percent correct (left) and reaction time (RT, right) across all recording sessions. (e,f) The impact
of delay length and reward size on behavior on free-choice trials that were interleaved within
Roesch et al. Page 13
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
the forced-choice trials. Line graphs show choice behavior before and after the switch from
short to long (e) and from big to small (f) reward; inset bar graphs show average percent choice
for short versus long (e) or big versus small (f) across all free-choice trials. Asterisks indicate
planned comparisons revealing statistically significant differences (t-test, P < 0.05). Error bars,
s.e.m.
Roesch et al. Page 14
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 2.
Locations, representative waveforms and classification of putative dopamine neurons. (a)
Location of the electrode track in each rat; boxes indicate approximate extent of lateral (and
anteroposterior) spread of the wires (~ 1 mm) centered on the final position (dot). (b) Example
waveforms for putative dopamine (DA) and nondopamine (ND) neurons. (c) Results of cluster
analysis based on spike duration (d in waveform inset, y-axis, ms) and the amplitude ratio (x-
axis) of the initial negative (n in inset) and positive (p in inset) segments. The center and
variance of each cluster was computed without data from the neuron of interest, and then that
neuron was assigned to a cluster if it was within 3 s.d. of the center. Neurons that met this
criterion for more than one cluster were not classified. This process was repeated for each
neuron. Putative dopamine neurons are shown in black; neurons that classified with other
clusters, no clusters or more than one cluster are shown as open symbols. Neurons recorded
before and after intravenous infusion of apomorphine are shown in red. Inset cumulative sum
plots show the effects of apomorphine on baseline firing in two DA neurons and one ND neuron.
Roesch et al. Page 15
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 3.
Activity during reward reflects prediction errors in a subpopulation of cue-responsive
dopamine neurons. (a) Left, example of error signaling (negative prediction error) when an
expected reward was omitted during the transition from a ‘big’ to a ‘small’ block (gray arrow).
Right, example of error signaling (positive prediction error) from the neuron shown in a when
reward was instituted during the transition from a ‘small’ to a ‘big’ block (first black arrow).
For this neuron, an additional third bolus was delivered several trials later (second black arrow)
to further illustrate prediction error encoding. Activity is aligned to the onset of the first
unexpected reward. Raster display includes free- and forced-choice trials. Consistent with
encoding of errors, activity changes were transient, diminishing as the rats learned to expect
(or not expect) reward at that time during each trial block. These effects are quantified during
reward omission (left) and reward delivery (right) for dopamine neurons that were cue- and
reward-responsive (b; n = 19) and dopamine neurons that were cue-responsive only (c; n = 14)
by comparing the average firing rate of each neuron during the 500 ms after an expected reward
was omitted (left) or an unexpected reward was instituted (right) in the first five versus the last
fifteen trials in the appropriate trial blocks (see text). Black dots represent neurons in which
the difference in firing was statistically significant (t-test; P < 0.05). P-values in scatter plots
indicate results of chi-square tests comparing the number of neurons above and below the
diagonal in each plot. Bar graphs represent average firing rates for each population. Asterisks
indicate planned comparisons revealing statistically significant differences (t-test, P < 0.05).
Error bars, s.e.m.
Roesch et al. Page 16
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 4.
Cue-evoked activity in reward-responsive dopamine neurons reflects the value of the predicted
rewards. (a) Single-unit example of cue-evoked activity in dopamine neurons on forced-choice
trials. Initially the odor predicted that the reward would be delivered immediately (‘short’).
Subsequently, the same odor predicted a delayed reward (‘long’), an immediate but large
reward (‘big’), and finally an immediate but small reward (‘small’). Note, the ‘short’ and
‘small’ conditions were identical (1 bolus of reward after 500 ms) but differed in their relative
value because ‘short’ was paired with ‘long’ in the opposite well whereas ‘small’ was paired
with ‘big’. (b) Heat plots showing average activity of all cue/reward-responsive dopamine
neurons (n = 19) during the first and last twenty (10 per direction) forced-choice trials in each
training block (Fig. 1; blocks 1–4). Activity is shown, aligned on odor onset (‘align odor’) and
reward delivery (‘align reward’). Blocks 1–4 are shown in the order performed (top to bottom).
During block 1, rats responded after a ‘long’ delay or a ‘short’ delay to receive reward (starting
direction (left or right) was counterbalanced in each block and is collapsed here). In block 2,
the locations of the ‘short’ delay and ‘long’ delay were reversed. In blocks 3 and 4, delays were
held constant but the size of the reward (‘big’ or ‘small’) varied. Line display between heat
plots shows the rats’ behavior on free-choice trials that were interleaved within the forced-
choice trials from which the neural data were taken. Evidence for encoding of positive and
negative prediction errors described in Figure 2 can also be observed here whenever reward is
unexpectedly delivered (white arrows) or omitted (gray arrow, analysis epoch). (c,d) Line
graphs summarizing the data shown in b. Lines representing average firing rate are broken 1
s after cue onset and 500 ms before reward delivery so that activity can be aligned on both
events. Insets: Bar graphs represent average firing rates (gray bar). Blue, short; red, long; green,
big; orange, small; dashed, first 10; solid, last 10. Asterisks indicate planned comparisons
revealing statistically significant differences (t-test, P < 0.05). Error bars, s.e.m.
Roesch et al. Page 17
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 5.
Cue-evoked activity in reward-responsive dopamine neurons covaries with the delay and size
of the predicted reward and its relative value. (a) Comparison of the difference in firing rate
on high- and low- value trials for each cue/reward-responsive neuron (n = 19), calculated
separately for ‘delay’ (short-long) and ‘reward’ blocks (big-small). Colored dots represent
those neurons that showed a significant difference in firing between ‘high’ and ‘low’ conditions
(t-test; P < 0.05; blue: delay; green: reward; black: both reward and delay). Difference scores
were significantly higher in the last fifteen trials of each block (right), indicating that cue
selectivity developed with learning. Furthermore, the scores calculated from ‘delay’ and
‘reward’ blocks were significantly correlated after learning (right), indicating that encoding of
cue value co-varied across the two value manipulations. (b) Average firing rate for the same
cue/reward-responsive neurons (n = 19) under ‘short’ versus ‘small’ conditions. Purple dots
represent neurons that showed a significant difference in firing between ‘short’ and ‘small’
conditions (t-test, P < 0.05). Neurons were significantly more likely to fire more strongly for
a short than for a small reward (chi-square, P < 0.001). (c,d) Same analysis as in a and b for
cue-responsive neurons that did not respond to reward (n = 14).
Roesch et al. Page 18
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
Figure 6.
Cue-evoked activity on free-choice trials reflects the more valuable option. (ad) Figures show
average activity of all cue/reward-responsive dopamine neurons (n = 19) on forced- and free-
choice trials, collapsed across direction, for ‘delay’ (a,b) and ‘reward’ blocks (c,d). To control
for learning, we only included trials after behavior reflected the contingencies in the current
block (>50% choice of more valuable option). Furthermore, to control for the possibility that
low-value choices might be more frequent early during this block of trials, we paired each free-
choice trial with the immediately preceding and following forced-choice trial of the same value.
The line graphs show average activity from these trials on forced and free-choice trials in each
condition, aligned to odor onset. Bar graphs represent the average firing rate (FR) between
odor onset to odor offset (top) and the average reaction time (bottom). Blue, short; red, long;
green, big; orange, small. Long-forced versus short-free, t-test, P = 0.002; long-forced versus
long-free, t-test, P = 0.002; long-forced versus short-forced, t-test, P = 0.001; short-forced
versus short-free, t-test, P = 0.641; short-forced versus long-free, t-test, P = 0.431; long-free
versus short-free, t-test, P = 0.220; small-forced versus big-free, t-test, P = 0.004; small-forced
versus small-free, t-test, P = 0.006; small-forced versus big-forced, t-test, P = 0.002; big-forced
versus big-free, t-test, P = 0.244; big-forced versus small-free, t-test, P = 0.104; small-free
versus big-free, t-test, P = 0.221.
Roesch et al. Page 19
Nat Neurosci. Author manuscript; available in PMC 2008 October 7.
NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

Supplementary resource (1)

... A prominent theory of midbrain DA function comes from temporal difference reinforcement learning (TDRL), which uses prediction errors to update state and action values (7). There is a remarkable resemblance between phasic DA activity and TDRL prediction errors (8)(9)(10)(11), and exogenously evoked DA modulates behavior as if evoking an artificial prediction error (12)(13)(14)(15). TDRL assumes that agents learn values of cues and actions, where value is defined as the time-discounted expectation of future reward (7). ...
Article
Full-text available
Learning causal relationships relies on understanding how often one event precedes another. To investigate how dopamine neuron activity and neurotransmitter release change when a retrospective relationship is degraded for a specific pair of events, we used outcome-selective Pavlovian contingency degradation in rats. Conditioned responding was attenuated for the cue-reward contingency that was degraded, as was dopamine neuron activity in the midbrain and dopamine release in the ventral striatum in response to the cue and subsequent reward. Contingency degradation also abolished the trial-by-trial history dependence of the dopamine responses at the time of trial outcome. This profile of changes in cue- and reward-evoked responding is not easily explained by a standard reinforcement learning model. An alternative model based on learning causal relationships was better able to capture dopamine responses during contingency degradation, as well as conditioned behavior following optogenetic manipulations of dopamine during noncontingent rewards. Our results suggest that mesostriatal dopamine encodes the contingencies between meaningful events during learning.
... Analogous to the CS in Pavlovian conditioning, here, in instrumental conditioning, the CS (for some visual target epochs), and the CR (nose-poke choice for the other epochs) that triggered dopaminergic neuron firing, but this occurred while the animal was still learning, not after acquisition. While there are several studies of dopaminergic activity during instrumental learning and decision-making (Bayer & Glimcher, 2005;Morris, et al., 2006;Nishino, et al., 1987;Phillips, et al., 2003;Roesch, et al., 2007;Roitman, et al., 2004;Stuber, et al., 2005;Jones et al., 2010), in these experiments, CRs and trial outcome signals were concurrent, and phasic responses to the CR could not be disambiguated from those to the reward signal. ...
Preprint
Full-text available
In the canonical interpretation of phasic activation of dopaminergic neurons during Pavlovian conditioning, initially cell firing is triggered by unexpected rewards. Upon learning, activation instead follows the reward-predictive conditioned stimulus. When expected rewards are withheld, firing is inhibited. Here, we recorded optogenetically identified dopaminergic neurons of ventral tegmental area (VTA) in mice training in successive operant sensory discrimination tasks. A delay was imposed between nose-poke choices and trial outcome signals (reward or punishment). While animals were still performing at sub-criterion levels in the task, firing increased after correct choices, but prior to trial outcome signals. Thus, the neurons predicted whether choices would be rewarded, despite the animals’ poor behavioral performance. Surprisingly, these neurons also fired after reward delivery, as if the rewards had been unexpected, but the cells were inhibited after punishment signals, as if the reward had been expected after all. These inconsistencies suggest extension of theoretical formulations of dopaminergic neuronal activity: it would embody multiple roles in temporal difference learning and actor-critic models. Furthermore, during chance and sub-criterion performance levels during task training, the mice performed other task strategies (e.g., alternation and spatial persistence) which did not reliably elicit rewards, again while these neurons predicted the correct choice. The reward prediction activity of these neurons could serve as critic signal for the preceding choice. These finding are consistent with the notion that multiple Bayesian belief representations must be reconciled prior to reaching criterion performance levels.
... NAc DA is associated with other functions besides RPE, such as the encoding of incentive salience [22], perceived salience [21], motivation [47], and costs [47][48][49][50]. An interpretation of the increased DA to proximal cues within these frameworks suggests that psychedelics may increase the salience and/or motivating aspects of proximal cues and rewards, while potentially decreasing the salience of the distal CS, evidenced by the tendency of DOI to reduce signtracking (associated with incentive salience) and distal cue DA -though we note DA was not consistently decreased to the distal CS for water (see Fig 3&4) across experiments. ...
Preprint
Full-text available
Psychedelics produce lasting therapeutic responses in neuropsychiatric diseases suggesting they may disrupt entrenched associations and catalyze learning. Here, we examine psychedelic effects on dopamine signaling in the nucleus accumbens (NAc) core, a region extensively linked to reward learning, motivation, and drug-seeking. We measure phasic dopamine transients following acute psychedelic administration during well learned Pavlovian tasks in which sequential cues predict rewards. We find that the psychedelic 5-HT 2A/2C agonist, DOI, increases dopamine signaling to rewards and proximal reward cues but not to the distal cues that predict these events. We determine that the elevated dopamine produced by psychedelics to reward cues occurs independently of psychedelic-induced changes in reward value. The increased dopamine associated with predictable reward cues supports psychedelic-induced increases in prediction error signaling. These findings lay a foundation for developing psychedelic strategies aimed at engaging error-driven learning mechanisms to disrupt entrenched associations or produce new associations.
... To measure dopamine transients during operant social interaction, we used fiber photometry with the dopamine sensor GRAB DA (Sun et al., 2018(Sun et al., , 2020 and recorded dopamine from nucleus accumbens (NAc) core and dorsomedial striatum (DMS). We chose these regions because previous studies show that they play critical roles in reward learning, decisionmaking, and social behavior (Roesch et al., 2007;Vanderschuren et al., 2016;Langdon et al., 2018;Collins and Saunders, 2020). ...
Article
When rats are given discrete choices between social interactions with a peer and opioid or psychostimulant drugs, they choose social interaction, even after extensive drug self-administration experience. Studies show that like drug and nondrug food reinforcers, social interaction is an operant reinforcer and induces dopamine release. However, these studies were conducted with same-sex peers. We examined if peer sex influences operant social interaction and the role of estrous cycle and striatal dopamine in same- versus opposite-sex social interaction. We trained male and female rats ( n = 13 responders/12 peers) to lever-press (fixed-ratio 1 [FR1] schedule) for 15 s access to a same- or opposite-sex peer for 16 d (8 d/sex) while tracking females’ estrous cycle. Next, we transfected GRAB-DA2m and implanted optic fibers into nucleus accumbens (NAc) core and dorsomedial striatum (DMS). We then retrained the rats for 15 s social interaction (FR1 schedule) for 16 d (8 d/sex) and recorded striatal dopamine during operant responding for a peer for 8 d (4 d/sex). Finally, we assessed economic demand by manipulating FR requirements for a peer (10 d/sex). In male, but not female rats, operant responding was higher for the opposite-sex peer. Female's estrous cycle fluctuations had no effect on operant social interaction. Striatal dopamine signals for operant social interaction were dependent on the peer's sex and striatal region (NAc core vs DMS). Results indicate that estrous cycle fluctuations did not influence operant social interaction and that NAc core and DMS dopamine activity reflect sex-dependent features of volitional social interaction.
... The dopamine data clearly favored SARSA (Fig. 6e versus Extended Data Fig. 6i). There is mixed evidence in the literature regarding what type of state-action value learning might be used by the brain 27,49 . SARSA and Q-learning differ most in exploratory trials (that is, trials in which the lower-valued action is selected by mistake). ...
Article
Full-text available
Striatal dopamine drives associative learning by acting as a teaching signal. Much work has focused on simple learning paradigms, including Pavlovian and instrumental learning. However, higher cognition requires that animals generate internal concepts of their environment, where sensory stimuli, actions and outcomes become flexibly associated. Here, we performed fiber photometry dopamine measurements across the striatum of male mice as they learned cue–action–outcome associations based on implicit and changing task rules. Reinforcement learning models of the behavioral and dopamine data showed that rule changes lead to adjustments of learned cue–action–outcome associations. After rule changes, mice discarded learned associations and reset outcome expectations. Cue- and outcome-triggered dopamine signals became uncoupled and dependent on the adopted behavioral strategy. As mice learned the new association, coupling between cue- and outcome-triggered dopamine signals and task performance re-emerged. Our results suggest that dopaminergic reward prediction errors reflect an agent’s perceived locus of control.
... We identified putative dopamine neurons by means of a waveform analysis like that used to identity dopamine neurons in primates (Bromberg-Martin et al., 2010;Fiorillo et al., 2008;Hollerman and Schultz, 1998;Kobayashi and Schultz, 2008;Matsumoto and Hikosaka, 2009;Mirenowicz and Schultz, 1994;Morris et al., 2006;Waelti et al., 2001). This analysis isolates neurons in rat VTA whose firing is sensitive to intravenous infusion off apomorphine or quinpirole (Jo et al., 2013;Roesch et al., 2007) and which are selectively eliminated by infusion of a Casp3 neurotoxin (AAV1-Flex-TaCasp3-TEVp) into VTA of TH-Cre transgenic rats (Takahashi et al., 2017). ...
Preprint
Full-text available
The orbitofrontal cortex (OFC) and hippocampus (HC) are both implicated in forming the cognitive or task maps that support flexible behavior. Previously, we used the dopamine neurons as a sensor or tool to measure the functional effects of OFC lesions (Takahashi et al., 2011). We recorded midbrain dopamine neurons as rats performed an odor-based choice task, in which errors in the prediction of reward were induced by manipulating the number or timing of the expected rewards across blocks of trials. We found that OFC lesions ipsilateral to the recording electrodes caused prediction errors to be degraded consistent with a loss in the resolution of the task states, particularly under conditions where hidden information was critical to sharpening the predictions. Here we have repeated this experiment, along with computational modeling of the results, in rats with ipsilateral HC lesions. The results show HC also shapes the map of our task, however unlike OFC, which provides information local to the trial, the HC appears to be necessary for estimating the upper-level hidden states based on the information that is discontinuous or separated by longer timescales. The results contrast the respective roles of the OFC and HC in cognitive mapping and add to evidence that the dopamine neurons access a rich information set from distributed regions regarding the predictive structure of the environment, potentially enabling this powerful teaching signal to support complex learning and behavior.
Article
Full-text available
Dopamine neurons in the ventral tegmental area support intracranial self-stimulation (ICSS), yet the cognitive representations underlying this phenomenon remain unclear. Here, 20-Hz stimulation of dopamine neurons, which approximates a physiologically relevant prediction error, was not sufficient to support ICSS beyond a continuously reinforced schedule and did not endow cues with a general or specific value. However, 50-Hz stimulation of dopamine neurons was sufficient to drive robust ICSS and was represented as a specific reward to motivate behavior. The frequency dependence of this effect is due to the rate (not the number) of action potentials produced by dopamine neurons, which differently modulates dopamine release downstream.
Article
Full-text available
Animals make predictions to guide their behavior and update those predictions through experience. Transient increases in dopamine (DA) are thought to be critical signals for updating predictions. However, it is unclear how this mechanism handles a wide range of behavioral timescales—from seconds or less (for example, if singing a song) to potentially hours or more (for example, if hunting for food). Here we report that DA transients in distinct rat striatal subregions convey prediction errors based on distinct time horizons. DA dynamics systematically accelerated from ventral to dorsomedial to dorsolateral striatum, in the tempo of spontaneous fluctuations, the temporal integration of prior rewards and the discounting of future rewards. This spectrum of timescales for evaluative computations can help achieve efficient learning and adaptive motivation for a broad range of behaviors.
Article
Full-text available
Clinical manifestations in diseases affecting the dopamine system include deficits in emotional, cognitive, and motor function. Although the parallel organization of specific corticostriatal pathways is well documented, mechanisms by which dopamine might integrate information across different cortical/basal ganglia circuits are less well understood. We analyzed a collection of retrograde and anterograde tracing studies to understand how the striatonigrostriatal (SNS) subcircuit directs information flow between ventromedial (limbic), central (associative), and dorsolateral (motor) striatal regions. When viewed as a whole, the ventromedial striatum projects to a wide range of the dopamine cells and receives a relatively small dopamine input. In contrast, the dorsolateral striatum (DLS) receives input from a broad expanse of dopamine cells and has a confined input to the substantia nigra (SN). The central striatum (CS) receives input from and projects to a relatively wide range of the SN. The SNS projection from each striatal region contains three substantia nigra components: a dorsal group of nigrostriatal projecting cells, a central region containing both nigrostriatal projecting cells and its reciprocal striatonigral terminal fields, and a ventral region that receives a specific striatonigral projection but does not contain its reciprocal nigrostriatal projection. Examination of results from multiple tracing experiments simultaneously demonstrates an interface between different striatal regions via the midbrain dopamine cells that forms an ascending spiral between regions. The shell influences the core, the core influences the central striatum, and the central striatum influences the dorsolateral striatum. This anatomical arrangement creates a hierarchy of information flow and provides an anatomical basis for the limbic/cognitive/motor interface via the ventral midbrain.
Article
Full-text available
Instrumental conditioning studies how animals and humans choose actions appropriate to the affective structure of an environment. According to recent reinforcement learning models, two distinct components are involved: a “critic,” which learns to predict future reward, and an “actor,” which maintains information about the rewarding outcomes of actions to enable better ones to be chosen more frequently. We scanned human participants with functional magnetic resonance imaging while they engaged in instrumental conditioning. Our results suggest partly dissociable contributions of the ventral and dorsal striatum, with the former corresponding to the critic and the latter corresponding to the actor.
Article
Full-text available
Discusses the cognitive and the psychophysical determinants of choice in risky and riskless contexts. The psychophysics of value induce risk aversion in the domain of gains and risk seeking in the domain of losses. The psychophysics of chance induce overweighting of sure things and of improbable events, relative to events of moderate probability. Decision problems can be described or framed in multiple ways that give rise to different preferences, contrary to the invariance criterion of rational choice. The process of mental accounting, in which people organize the outcomes of transactions, explains some anomalies of consumer behavior. In particular, the acceptability of an option can depend on whether a negative outcome is evaluated as a cost or as an uncompensated loss. The relationships between decision values and experience values and between hedonic experience and objective states are discussed. (27 ref)
Article
Full-text available
Amnesic patients and nondemented patients with Parkinson's disease were given a probabilistic classification task in which they learned which of two outcomes would occur on each trial, given the particular combination of cues that appeared. Amnesic patients exhibited normal learning of the task but had severely impaired declarative memory for the training episode. In contrast, patients with Parkinson's disease failed to learn the probabilistic classification task, despite having intact memory for the training episode. This double dissociation shows that the limbic-diencephalic regions damaged in amnesia and the neostriatum damaged in Parkinson's disease support separate and parallel learning systems. In humans, the neostriatum (caudate nucleus and putamen) is essential for the gradual, incremental learning of associations that is characteristic of habit learning. The neostriatum is important not just for motor behavior and motor learning but also for acquiring nonmotor dispositions and tendencies that depend on new associations.
Article
Full-text available
Considerable evidence suggests that various discrete nuclei within the amygdala complex are critically involved in the assignment of emotional significance or value to events through associative learning. Much of this evidence comes from aversive conditioning procedures. For example, lesions of either basolateral amygdala (ABL) or the central nucleus (CN) interfere with the acquisition or expression of conditioned fear. The present study examined the effects of selective neurotoxic lesions of either ABL or CN on the acquisition of positive incentive value by a conditioned stimulus (CS) with two appetitive Pavlovian conditioning procedures. In second-order conditioning experiments, rats first received light-food pairings intended to endow the light with reinforcing power. The acquired reinforcing power of the light was then measured by examining its ability to serve as a reinforcer for second-order conditioning of a tone when tone-light pairings were given in the absence of food. Acquisition of second-order conditioning was impaired in rats with ABL lesions but not in rats with CN lesions. In reinforcer devaluation procedures, conditioned responding of rats with ABL lesions was insensitive to postconditioning changes in the value of the reinforcer, whereas rats with CN lesions, like normal rats, were able to spontaneously adjust their CRs to the current value of the reinforcer. The results of both test procedures indicate that ABL, but not CN, is part of a system involved in CSs' acquisition of positive incentive value. Together with evidence that identifies a role for CN in certain changes in attentional processing of CSs in conditioning, these results suggest that separate amygdala subsystems contribute to a variety of processes inherent in associative learning.
Article
We discuss the cognitive and the psy- chophysical determinants of choice in risky and risk- less contexts. The psychophysics of value induce risk aversion in the domain of gains and risk seeking in the domain of losses. The psychophysics of chance induce overweighting of sure things and of improbable events, relative to events of moderate probability. De- cision problems can be described or framed in multiple ways that give rise to different preferences, contrary to the invariance criterion of rational choice. The pro- cess of mental accounting, in which people organize the outcomes of transactions, explains some anomalies of consumer behavior. In particular, the acceptability of an option can depend on whether a negative outcome is evaluated as a cost or as an uncompensated loss. The relation between decision values and experience values is discussed. Making decisions is like speaking prose—people do it all the time, knowingly or unknowingly. It is hardly surprising, then, that the topic of decision making is shared by many disciplines, from mathematics and statistics, through economics and political science, to sociology and psychology. The study of decisions ad- dresses both normative and descriptive questions. The normative analysis is concerned with the nature of rationality and the logic of decision making. The de- scriptive analysis, in contrast, is concerned with peo- ple's beliefs and preferences as they are, not as they should be. The tension between normative and de- scriptive considerations characterizes much of the study of judgment and choice. Analyses of decision making commonly distin- guish risky and riskless choices. The paradigmatic example of decision under risk is the acceptability of a gamble that yields monetary outcomes with specified probabilities. A typical riskless decision concerns the acceptability of a transaction in which a good or a service is exchanged for money or labor. In the first part of this article we present an analysis of the cog- nitive and psychophysical factors that determine the value of risky prospects. In the second part we extend this analysis to transactions and trades. Risky Choice Risky choices, such as whether or not to take an umbrella and whether or not to go to war, are made without advance knowledge of their consequences. Because the consequences of such actions depend on uncertain events such as the weather or the opponent's resolve, the choice of an act may be construed as the acceptance of a gamble that can yield various out- comes with different probabilities. It is therefore nat- ural that the study of decision making under risk has focused on choices between simple gambles with monetary outcomes and specified probabilities, in the hope that these simple problems will reveal basic at- titudes toward risk and value. We shall sketch an approach to risky choice that
Article
Individual discount rates are estimated from survey evidence. For gains, they are found to vary inversely with the size of the reward and the length of time to be waited. Rates are found to be much smaller for losses that for gains.
Article
The responses of dopamine cells in the substantia nigra to iontophoretically administered dopamine and intravenous apomorphine were compared to the responses of spontaneously active neurons in the caudate nucleus. Dopaminergic cells were six to ten times more sensitive to dopamine and intravenous apomorphine than 86 percent of the caudate cells tested. This differential sensitivity of dopamine auto- and postsynaptic receptors may explain the apparently paradoxical behavioral effects induced by small compared to large doses of some dopamine agonists and may provide a means of developing new types of drugs to antagonize dopaminergic influence in the central nervous system.
Article
Effects of levodopa and apomorphine on the firing rate of dopaminergic neurones were compared with that of dexamphetamine by extracellular recording from single dopamine containing cells in the substantia nigra zona compacta and adjacent ventral tegmental area of rat midbrain. Parallel biochemical and histofluorescence studies were performed. It was found that dexamphetamine, levodopa and apomorphine in extremely small doses selectively decreased the activity of dopaminergic neurones. Continuing dopamine synthesis was necessary for levodopa and dexamphetamine to produce their effects but not for apomorphine. This supports previous evidence that apomorphine has a direct effect on dopamine receptors. Haloperidol and chlorpromazine, presumed dopamine receptor blockers, effectively blocked the depressant effects of dexamphetamine, levodopa and apomorphine, providing further evidence that dopamine receptor stimulation is required for their depressant effect on dopamine neurons.
Article
1. We used single neuron recording techniques in two behaving monkeys to investigate the conditions in which dopamine neurons respond to primary rewarding or potentially rewarding stimuli. Animals received drops of liquid either outside behavioral tasks or as rewards during learning or established performance of an auditory reaction time task. 2. Three quarters of dopamine neurons showed a short-latency, phasic response to liquid that was delivered outside the task without being predicted by phasic stimuli. The same neurons responded to liquid reward during learning but not when task performance was established, at which time the neuronal response occurred to the conditioned, reward-predicting, movement-triggering stimulus. 3. These data suggest that the responses of dopamine neurons to rewarding or potentially rewarding liquid are due to the temporally unpredicted stimulus occurrence. A known, reward-predicting, tonic context does not prevent dopamine neurons from responding to the rewarding liquid. The responses during learning apparently occur because reward is not yet reliably predicted by a conditioned phasic stimulus. Because the unpredicted occurrence of reward is of central importance for learning, these responses allow dopamine neurons to play an important role in reward-driven learning.