Article
"Reconciling reinforcement learning models with behavioral extinction and renewal: Implications for addiction, relapse, and problem gambling": Correction.
Department of Neuroscience.
Psychological Review (impact factor:
7.76).
08/2009;
116(3):518.
DOI:10.1037/a0016243
pp.518
Source: PubMed
-
Citations (0)
- Cited In (4)
-
Article: A neural computational model of incentive salience.
[show abstract] [hide abstract]
ABSTRACT: Incentive salience is a motivational property with 'magnet-like' qualities. When attributed to reward-predicting stimuli (cues), incentive salience triggers a pulse of 'wanting' and an individual is pulled toward the cues and reward. A key computational question is how incentive salience is generated during a cue re-encounter, which combines both learning and the state of limbic brain mechanisms. Learning processes, such as temporal-difference models, provide one way for stimuli to acquire cached predictive values of rewards. However, empirical data show that subsequent incentive values are also modulated on the fly by dynamic fluctuation in physiological states, altering cached values in ways requiring additional motivation mechanisms. Dynamic modulation of incentive salience for a Pavlovian conditioned stimulus (CS or cue) occurs during certain states, without necessarily requiring (re)learning about the cue. In some cases, dynamic modulation of cue value occurs during states that are quite novel, never having been experienced before, and even prior to experience of the associated unconditioned reward in the new state. Such cases can include novel drug-induced mesolimbic activation and addictive incentive-sensitization, as well as natural appetite states such as salt appetite. Dynamic enhancement specifically raises the incentive salience of an appropriate CS, without necessarily changing that of other CSs. Here we suggest a new computational model that modulates incentive salience by integrating changing physiological states with prior learning. We support the model with behavioral and neurobiological data from empirical tests that demonstrate dynamic elevations in cue-triggered motivation (involving natural salt appetite, and drug-induced intoxication and sensitization). Our data call for a dynamic model of incentive salience, such as presented here. Computational models can adequately capture fluctuations in cue-triggered 'wanting' only by incorporating modulation of previously learned values by natural appetite and addiction-related states.PLoS Computational Biology 08/2009; 5(7):e1000437. · 5.22 Impact Factor -
Article: Speed/accuracy trade-off between the habitual and the goal-directed processes.
[show abstract] [hide abstract]
ABSTRACT: Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time.PLoS Computational Biology 05/2011; 7(5):e1002055. · 5.22 Impact Factor -
Article: Altered risk-based decision making following adolescent alcohol use results from an imbalance in reinforcement learning in rats.
[show abstract] [hide abstract]
ABSTRACT: Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life.PLoS ONE 01/2012; 7(5):e37357. · 4.09 Impact Factor
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed.
The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual
current impact factor.
Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence
agreement may be applicable.
Keywords
A. David Redish
Adam Johnson
behavioral extinction
current article
extinction process
figures 1
following abstract
observed cues
original article
PsycINFO Database Record
Reconciling reinforcement
reward error
reward prediction error signal
situation recognition process
situation-action pairs
Steve Jensen
supplemental material
TDRL model
temporal difference reinforcement
Zeb Kurth-Nelson