Article

"Reconciling reinforcement learning models with behavioral extinction and renewal: Implications for addiction, relapse, and problem gambling": Correction.

Department of Neuroscience.
Psychological Review (impact factor: 7.76). 08/2009; 116(3):518. DOI:10.1037/a0016243 pp.518
Source: PubMed

ABSTRACT Reports an error in "Reconciling reinforcement learning models with behavioral extinction and renewal: Implications for addiction, relapse, and problem gambling" by A. David Redish, Steve Jensen, Adam Johnson and Zeb Kurth-Nelson (Psychological Review, 2007[Jul], Vol 114[3], 784-805). In the current article, the URL for the supplemental material was incomplete in the legends of figures 1 and 3-8. The complete URL is: http://dx.doi.org/10.1037/0033-295X.114.3.784.supp. (The following abstract of the original article appeared in record 2007-10421-010.) Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL models are based on the hypothesis that dopamine carries a reward prediction error signal; these models predict reward by driving that reward error to zero. The authors construct a TDRL model that can accommodate extinction and renewal through two simple processes: (a) a TDRL process that learns the value of situation-action pairs and (b) a situation recognition process that categorizes the observed cues into situations. This model has implications for dysfunctional states, including relapse after addiction and problem gambling. (PsycINFO Database Record (c) 2009 APA, all rights reserved).

0 0
 · 
0 Bookmarks
 · 
86 Views
  • Source
    Article: A neural computational model of incentive salience.
    [show abstract] [hide abstract]
    ABSTRACT: Incentive salience is a motivational property with 'magnet-like' qualities. When attributed to reward-predicting stimuli (cues), incentive salience triggers a pulse of 'wanting' and an individual is pulled toward the cues and reward. A key computational question is how incentive salience is generated during a cue re-encounter, which combines both learning and the state of limbic brain mechanisms. Learning processes, such as temporal-difference models, provide one way for stimuli to acquire cached predictive values of rewards. However, empirical data show that subsequent incentive values are also modulated on the fly by dynamic fluctuation in physiological states, altering cached values in ways requiring additional motivation mechanisms. Dynamic modulation of incentive salience for a Pavlovian conditioned stimulus (CS or cue) occurs during certain states, without necessarily requiring (re)learning about the cue. In some cases, dynamic modulation of cue value occurs during states that are quite novel, never having been experienced before, and even prior to experience of the associated unconditioned reward in the new state. Such cases can include novel drug-induced mesolimbic activation and addictive incentive-sensitization, as well as natural appetite states such as salt appetite. Dynamic enhancement specifically raises the incentive salience of an appropriate CS, without necessarily changing that of other CSs. Here we suggest a new computational model that modulates incentive salience by integrating changing physiological states with prior learning. We support the model with behavioral and neurobiological data from empirical tests that demonstrate dynamic elevations in cue-triggered motivation (involving natural salt appetite, and drug-induced intoxication and sensitization). Our data call for a dynamic model of incentive salience, such as presented here. Computational models can adequately capture fluctuations in cue-triggered 'wanting' only by incorporating modulation of previously learned values by natural appetite and addiction-related states.
    PLoS Computational Biology 08/2009; 5(7):e1000437. · 5.22 Impact Factor
  • Source
    Article: Speed/accuracy trade-off between the habitual and the goal-directed processes.
    [show abstract] [hide abstract]
    ABSTRACT: Instrumental responses are hypothesized to be of two kinds: habitual and goal-directed, mediated by the sensorimotor and the associative cortico-basal ganglia circuits, respectively. The existence of the two heterogeneous associative learning mechanisms can be hypothesized to arise from the comparative advantages that they have at different stages of learning. In this paper, we assume that the goal-directed system is behaviourally flexible, but slow in choice selection. The habitual system, in contrast, is fast in responding, but inflexible in adapting its behavioural strategy to new conditions. Based on these assumptions and using the computational theory of reinforcement learning, we propose a normative model for arbitration between the two processes that makes an approximately optimal balance between search-time and accuracy in decision making. Behaviourally, the model can explain experimental evidence on behavioural sensitivity to outcome at the early stages of learning, but insensitivity at the later stages. It also explains that when two choices with equal incentive values are available concurrently, the behaviour remains outcome-sensitive, even after extensive training. Moreover, the model can explain choice reaction time variations during the course of learning, as well as the experimental observation that as the number of choices increases, the reaction time also increases. Neurobiologically, by assuming that phasic and tonic activities of midbrain dopamine neurons carry the reward prediction error and the average reward signals used by the model, respectively, the model predicts that whereas phasic dopamine indirectly affects behaviour through reinforcing stimulus-response associations, tonic dopamine can directly affect behaviour through manipulating the competition between the habitual and the goal-directed systems and thus, affect reaction time.
    PLoS Computational Biology 05/2011; 7(5):e1002055. · 5.22 Impact Factor
  • Source
    Article: Altered risk-based decision making following adolescent alcohol use results from an imbalance in reinforcement learning in rats.
    [show abstract] [hide abstract]
    ABSTRACT: Alcohol use during adolescence has profound and enduring consequences on decision-making under risk. However, the fundamental psychological processes underlying these changes are unknown. Here, we show that alcohol use produces over-fast learning for better-than-expected, but not worse-than-expected, outcomes without altering subjective reward valuation. We constructed a simple reinforcement learning model to simulate altered decision making using behavioral parameters extracted from rats with a history of adolescent alcohol use. Remarkably, the learning imbalance alone was sufficient to simulate the divergence in choice behavior observed between these groups of animals. These findings identify a selective alteration in reinforcement learning following adolescent alcohol use that can account for a robust change in risk-based decision making persisting into later life.
    PLoS ONE 01/2012; 7(5):e37357. · 4.09 Impact Factor

Keywords

A. David Redish
 
Adam Johnson
 
behavioral extinction
 
current article
 
extinction process
 
figures 1
 
following abstract
 
observed cues
 
original article
 
PsycINFO Database Record
 
Reconciling reinforcement
 
reward error
 
reward prediction error signal
 
situation recognition process
 
situation-action pairs
 
Steve Jensen
 
supplemental material
 
TDRL model
 
temporal difference reinforcement
 
Zeb Kurth-Nelson
 

David A Redish