[show abstract][hide abstract] ABSTRACT: Decision-making involves two fundamental axes of control namely valence, spanning reward and punishment, and action, spanning invigoration and inhibition. We recently exploited a go/no-go task whose contingencies explicitly decouple valence and action to show that these axes are inextricably coupled during learning. This results in a disadvantage in learning to go to avoid punishment and in learning to no-go to obtain a reward. The neuromodulators dopamine and serotonin are likely to play a role in these asymmetries: Dopamine signals anticipation of future rewards and is also involved in an invigoration of motor responses leading to reward, but it also arbitrates between different forms of control. Conversely, serotonin is implicated in motor inhibition and punishment processing.
To investigate the role of dopamine and serotonin in the interaction between action and valence during learning.
We combined computational modeling with pharmacological manipulation in 90 healthy human volunteers, using levodopa and citalopram to affect dopamine and serotonin, respectively.
We found that, after administration of levodopa, action learning was less affected by outcome valence when compared with the placebo and citalopram groups. This highlights in this context a predominant effect of levodopa in controlling the balance between different forms of control. Citalopram had distinct effects, increasing participants' tendency to perform active responses independent of outcome valence, consistent with a role in decreasing motor inhibition.
Our findings highlight the rich complexities of the roles played by dopamine and serotonin during instrumental learning.
[show abstract][hide abstract] ABSTRACT: BACKGROUND: Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. METHODS: Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. RESULTS: MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. CONCLUSION: Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling, and was preferentially modulated by a low dose of the dopamine agonist pramipexole. The second related directly to reward sensitivity, and was preferentially reduced in MDD and anhedonia. Stress altered both components. Collectively, these findings highlight the contribution of model-based reinforcement learning meta-analysis for dissecting anhedonic behavior.
Biology of mood & anxiety disorders. 06/2013; 3(1):12.
[show abstract][hide abstract] ABSTRACT: A wealth of studies has found that adapting to second-order visual stimuli has little effect on the perception of first-order stimuli. This is physiologically and psychologically troubling, since many cells show similar tuning to both classes of stimuli, and since adapting to first-order stimuli leads to aftereffects that do generalize to second-order stimuli. Focusing on high-level visual stimuli, we recently proposed the novel explanation that the lack of transfer arises partially from the characteristically different backgrounds of the two stimulus classes. Here, we consider the effect of stimulus backgrounds in the far more prevalent, lower-level, case of the orientation tilt aftereffect. Using a variety of first- and second-order oriented stimuli, we show that we could increase or decrease both within- and cross-class adaptation aftereffects by increasing or decreasing the similarity of the otherwise apparently uninteresting or irrelevant backgrounds of adapting and test patterns. Our results suggest that similarity between background statistics of the adapting and test stimuli contributes to low-level visual adaptation, and that these backgrounds are thus not discarded by visual processing but provide contextual modulation of adaptation. Null cross-adaptation aftereffects must also be interpreted cautiously. These findings reduce the apparent inconsistency between psychophysical and neurophysiological data about first- and second-order stimuli.
[show abstract][hide abstract] ABSTRACT: Receptive fields acquired through unsupervised learning of sparse representations of natural scenes have similar properties to primary visual cortex (V1) simple cell receptive fields. However, what drives in vivo development of receptive fields remains controversial. The strongest evidence for the importance of sensory experience in visual development comes from receptive field changes in animals reared with abnormal visual input. However, most sparse coding accounts have considered only normal visual input and the development of monocular receptive fields. Here, we applied three sparse coding models to binocular receptive field development across six abnormal rearing conditions. In every condition, the changes in receptive field properties previously observed experimentally were matched to a similar and highly faithful degree by all the models, suggesting that early sensory development can indeed be understood in terms of an impetus towards sparsity. As previously predicted in the literature, we found that asymmetries in inter-ocular correlation across orientations lead to orientation-specific binocular receptive fields. Finally we used our models to design a novel stimulus that, if present during rearing, is predicted by the sparsity principle to lead robustly to radically abnormal receptive fields.
[show abstract][hide abstract] ABSTRACT: Neural representations of the effort deployed in performing actions, and the valence of the outcomes they yield, form the foundation of action choice. To discover whether brain areas represent effort and outcome valence together or if they represent one but not the other, we examined these variables in an explicitly orthogonal way. We did this by asking human subjects to exert one of two levels of effort to improve their chances of either winning or avoiding the loss of money. Subjects responded faster both when exerting greater effort and when exerting effort in anticipation of winning money. Using fMRI, we inspected BOLD responses during anticipation (before any action was executed) and when the outcome was delivered. In this way, we indexed BOLD signals associated with an anticipated need to exert effort and its affective consequences, as well as the effect of executed effort on the representation of outcomes. Anterior cingulate cortex and dorsal striatum (dorsal putamen) signaled the anticipation of effort independently of the prospect of winning or losing. Activity in ventral striatum (ventral putamen) was greater for better-than-expected outcomes compared with worse-than-expected outcomes, an effect attenuated in the context of having exerted greater effort. Our findings provide evidence that neural representations of anticipated actions are sensitive to the expected demands, but not to the expected value of their consequence, whereas representations of outcome value are discounted by exertion, commensurate with an integration of cost and benefit so as to approximate net value.
Journal of Neuroscience 04/2013; 33(14):6160-9. · 6.91 Impact Factor
[show abstract][hide abstract] ABSTRACT: Senescence affects the ability to utilize information about the likelihood of rewards for optimal decision-making. Using functional magnetic resonance imaging in humans, we found that healthy older adults had an abnormal signature of expected value, resulting in an incomplete reward prediction error (RPE) signal in the nucleus accumbens, a brain region that receives rich input projections from substantia nigra/ventral tegmental area (SN/VTA) dopaminergic neurons. Structural connectivity between SN/VTA and striatum, measured by diffusion tensor imaging, was tightly coupled to inter-individual differences in the expression of this expected reward value signal. The dopamine precursor levodopa (L-DOPA) increased the task-based learning rate and task performance in some older adults to the level of young adults. This drug effect was linked to restoration of a canonical neural RPE. Our results identify a neurochemical signature underlying abnormal reward processing in older adults and indicate that this can be modulated by L-DOPA.
[show abstract][hide abstract] ABSTRACT: Subjects routinely control the vigor with which they emit motoric responses. However the bulk of formal treatments of decision-making ignores this dimension of choice. A recent theoretical study suggested that action vigor should be influenced by experienced average reward rate, and that this rate is encoded by tonic dopamine in the brain. We previously examined how average reward rate modulates vigor as exemplified by response times, and found a measure of agreement with the first suggestion. In the current study we examined the second suggestion, namely the potential influence of dopamine signaling on vigor. Ninety healthy subjects participated in a double-blind experiment in which they received one of the following: placebo, L-DOPA (which increases dopamine levels in the brain) or citalopram (which has a selective, if complex, effect on serotonin levels). Subjects performed multiple trials of a rewarded odd-ball discrimination task in which we varied the potential reward over time in order to exercise the putative link between vigor and average reward rate. Replicating our prior findings we found a significant fraction of the variance in subjects' responses could be explained by our experimentally manipulated changes in average reward rate. Crucially, this relationship was significantly stronger under L-Dopa than under Placebo, suggesting the impact of average reward levels on action vigor is indeed subject to a dopaminergic influence.Neuropsychopharmacology accepted article preview online, 18 February 2013; doi:10.1038/npp.2013.48.
Neuropsychopharmacology: official publication of the American College of Neuropsychopharmacology 02/2013; · 6.99 Impact Factor
[show abstract][hide abstract] ABSTRACT: Reciprocating exchange with other humans requires individuals to infer the intentions of their partners. Despite the importance of this ability in healthy cognition and its impact in disease, the dimensions employed and computations involved in such inferences are not clear. We used a computational theory-of-mind model to classify styles of interaction in 195 pairs of subjects playing a multi-round economic exchange game. This classification produces an estimate of a subject's depth-of-thought in the game (low, medium, high), a parameter that governs the richness of the models they build of their partner. Subjects in each category showed distinct neural correlates of learning signals associated with different depths-of-thought. The model also detected differences in depth-of-thought between two groups of healthy subjects: one playing patients with psychiatric disease and the other playing healthy controls. The neural response categories identified by this computational characterization of theory-of-mind may yield objective biomarkers useful in the identification and characterization of pathologies that perturb the capacity to model and interact with other humans.
[show abstract][hide abstract] ABSTRACT: Reinforcement learning (RL) has become a dominant computational paradigm for modeling psychological and neural aspects of affectively charged decision-making tasks. RL is normally construed in terms of the interaction between a subject and its environment, with the former emitting actions, and the latter providing stimuli, and appetitive and aversive reinforcement. However, there is recent emphasis on redrawing the boundary between the two, with the organism constructing its own notion of reward, punishment and state, and with internal actions, such as the gating of working memory, being treated on an equal footing with external manipulation of the environment. We review recent work in this area, focusing on cognitive control.
Current opinion in neurobiology 06/2012; · 7.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: Establishing a function for the neuromodulator serotonin in human decision-making has proved remarkably difficult because if its complex role in reward and punishment processing. In a novel choice task where actions led concurrently and independently to the stochastic delivery of both money and pain, we studied the impact of decreased brain serotonin induced by acute dietary tryptophan depletion. Depletion selectively impaired both behavioral and neural representations of reward outcome value, and hence the effective exchange rate by which rewards and punishments were compared. This effect was computationally and anatomically distinct from a separate effect on increasing outcome-independent choice perseveration. Our results provide evidence for a surprising role for serotonin in reward processing, while illustrating its complex and multifarious effects.
Journal of Neuroscience 04/2012; 32(17):5833-42. · 6.91 Impact Factor
[show abstract][hide abstract] ABSTRACT: Dopamine is widely observed to signal anticipation of future rewards and thus thought to be a key contributor to affectively charged decision making. However, the experiments supporting this view have not dissociated rewards from the actions that lead to, or are occasioned by, them. Here, we manipulated dopamine pharmacologically and examined the effect on a task that explicitly dissociates action and reward value. We show that dopamine enhanced the neural representation of rewarding actions, without significantly affecting the representation of reward value as such. Thus, increasing dopamine levels with levodopa selectively boosted striatal and substantia nigra/ventral tegmental representations associated with actions leading to reward, but not with actions leading to the avoidance of punishment. These findings highlight a key role for dopamine in the generation of appetitively motivated actions.
Proceedings of the National Academy of Sciences 04/2012; 109(19):7511-6. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Decision-making invokes two fundamental axes of control: affect or valence, spanning reward and punishment, and effect or action, spanning invigoration and inhibition. We studied the acquisition of instrumental responding in healthy human volunteers in a task in which we orthogonalized action requirements and outcome valence. Subjects were much more successful in learning active choices in rewarded conditions, and passive choices in punished conditions. Using computational reinforcement-learning models, we teased apart contributions from putatively instrumental and Pavlovian components in the generation of the observed asymmetry during learning. Moreover, using model-based fMRI, we showed that BOLD signals in striatum and substantia nigra/ventral tegmental area (SN/VTA) correlated with instrumentally learnt action values, but with opposite signs for go and no-go choices. Finally, we showed that successful instrumental learning depends on engagement of bilateral inferior frontal gyrus. Our behavioral and computational data showed that instrumental learning is contingent on overcoming inherent and plastic Pavlovian biases, while our neuronal data showed this learning is linked to unique patterns of brain activity in regions implicated in action and inhibition respectively.
[show abstract][hide abstract] ABSTRACT: The role dopamine plays in decision-making has important theoretical, empirical and clinical implications. Here, we examined its precise contribution by exploiting the lesion deficit model afforded by Parkinson's disease. We studied patients in a two-stage reinforcement learning task, while they were ON and OFF dopamine replacement medication. Contrary to expectation, we found that dopaminergic drug state (ON or OFF) did not impact learning. Instead, the critical factor was drug state during the performance phase, with patients ON medication choosing correctly significantly more frequently than those OFF medication. This effect was independent of drug state during initial learning and appears to reflect a facilitation of generalization for learnt information. This inference is bolstered by our observation that neural activity in nucleus accumbens and ventromedial prefrontal cortex, measured during simultaneously acquired functional magnetic resonance imaging, represented learnt stimulus values during performance. This effect was expressed solely during the ON state with activity in these regions correlating with better performance. Our data indicate that dopamine modulation of nucleus accumbens and ventromedial prefrontal cortex exerts a specific effect on choice behaviour distinct from pure learning. The findings are in keeping with the substantial other evidence that certain aspects of learning are unaffected by dopamine lesions or depletion, and that dopamine plays a key role in performance that may be distinct from its role in learning.
[show abstract][hide abstract] ABSTRACT: Investigations of the underlying mechanisms of choice in humans have focused on learning from prediction errors, leaving the computational structure of value based planning comparatively underexplored. Using behavioral and neuroimaging analyses of a minimax decision task, we found that the computational processes underlying forward planning are expressed in the anterior caudate nucleus as values of individual branching steps in a decision tree. In contrast, values represented in the putamen pertain solely to values learned during extensive training. During actual choice, both striatal areas showed a functional coupling to ventromedial prefrontal cortex, consistent with this region acting as a value comparator. Our findings point toward an architecture of choice in which segregated value systems operate in parallel in the striatum for planning and extensively trained choices, with medial prefrontal cortex integrating their outputs.
[show abstract][hide abstract] ABSTRACT: Spatial context in images induces perceptual phenomena associated with salience and modulates the responses of neurons in primary visual cortex (V1). However, the computational and ecological principles underlying contextual effects are incompletely understood. We introduce a model of natural images that includes grouping and segmentation of neighboring features based on their joint statistics, and we interpret the firing rates of V1 neurons as performing optimal recognition in this model. We show that this leads to a substantial generalization of divisive normalization, a computation that is ubiquitous in many neural areas and systems. A main novelty in our model is that the influence of the context on a target stimulus is determined by their degree of statistical dependence. We optimized the parameters of the model on natural image patches, and then simulated neural and perceptual responses on stimuli used in classical experiments. The model reproduces some rich and complex response patterns observed in V1, such as the contrast dependence, orientation tuning and spatial asymmetry of surround suppression, while also allowing for surround facilitation under conditions of weak stimulation. It also mimics the perceptual salience produced by simple displays, and leads to readily testable predictions. Our results provide a principled account of orientation-based contextual modulation in early vision and its sensitivity to the homogeneity and spatial arrangement of inputs, and lends statistical support to the theory that V1 computes visual salience.
[show abstract][hide abstract] ABSTRACT: When planning a series of actions, it is usually infeasible to consider all potential future sequences; instead, one must prune the decision tree. Provably optimal pruning is, however, still computationally ruinous and the specific approximations humans employ remain unknown. We designed a new sequential reinforcement-based task and showed that human subjects adopted a simple pruning strategy: during mental evaluation of a sequence of choices, they curtailed any further evaluation of a sequence as soon as they encountered a large loss. This pruning strategy was Pavlovian: it was reflexively evoked by large losses and persisted even when overwhelmingly counterproductive. It was also evident above and beyond loss aversion. We found that the tendency towards Pavlovian pruning was selectively predicted by the degree to which subjects exhibited sub-clinical mood disturbance, in accordance with theories that ascribe Pavlovian behavioural inhibition, via serotonin, a role in mood disorders. We conclude that Pavlovian behavioural inhibition shapes highly flexible, goal-directed choices in a manner that may be important for theories of decision-making in mood disorders.
[show abstract][hide abstract] ABSTRACT: Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.
Journal of Neuroscience 01/2012; 32(2):551-62. · 6.91 Impact Factor
[show abstract][hide abstract] ABSTRACT: Bradykinesia is a cardinal feature of Parkinson's disease (PD). Despite its disabling impact, the precise cause of this symptom remains elusive. Recent thinking suggests that bradykinesia may be more than simply a manifestation of motor slowness, and may in part reflect a specific deficit in the operation of motivational vigour in the striatum. In this paper we test the hypothesis that movement time in PD can be modulated by the specific nature of the motivational salience of possible action-outcomes.
We developed a novel movement time paradigm involving winnable rewards and avoidable painful electrical stimuli. The faster the subjects performed an action the more likely they were to win money (in appetitive blocks) or to avoid a painful shock (in aversive blocks). We compared PD patients when OFF dopaminergic medication with controls. Our key finding is that PD patients OFF dopaminergic medication move faster to avoid aversive outcomes (painful electric shocks) than to reap rewarding outcomes (winning money) and, unlike controls, do not speed up in the current trial having failed to win money in the previous one. We also demonstrate that sensitivity to distracting stimuli is valence specific.
We suggest this pattern of results can be explained in terms of low dopamine levels in the Parkinsonian state leading to an insensitivity to appetitive outcomes, and thus an inability to modulate movement speed in the face of rewards. By comparison, sensitivity to aversive stimuli is relatively spared. Our findings point to a rarely described property of bradykinesia in PD, namely its selective regulation by everyday outcomes.
PLoS ONE 01/2012; 7(10):e47138. · 3.73 Impact Factor