Peter Dayan’s research while affiliated with University of Tübingen and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (644)


Weighting waiting: A decision-theoretic taxonomy of delay, pacing and procrastination
  • Preprint

January 2025

Sahiti Chebolu

·

Peter Dayan

Why do today, what you can fail to do tomorrow? Pacing styles involving postponing, and ultimately procrastinating tasks are widespread pathologies to which many succumb. Previous research has recognised multiple types of delayed working, influenced by a myriad of psychological and situational factors. However, the main mechanistic explanation for delays, which concerns temporal discounting of distant rewards, only encompasses a subset. Further, investigations of pacing have been rather independent from those of procrastination. Here, we introduce a systematic taxonomy of pacing and procrastination within a common framework that is based on the type of temporal decisions involved in choosing to delay. We suggest that these decisions are driven by characteristics of the task and the (sub-)optimality of the decision-making. We illustrate aspects of the taxonomy by simulating diverse sources of behavioral delay using reinforcement learning models. We also analyse whether students pacing their work through a semester in a real-world task (Zhang & Ma, Scientific Reports, 2024) showany evidence of behaving according to the types detailed in our taxonomy. Our approach provides a theoretical foundation for understanding pacing and procrastination, enabling the integration of both established and novel mechanisms within a unified conceptual framework.


Figure 3. ACC encodes transition information until feedback A, Average coefficient of partial determination (CPD) across neurons in each region for the encoding of the transition that occurred. Solid horizontal lines represent periods where that area's CPD differed significantly from all other areas CPDs (p<0.05 cluster-based permutation test). Dashed horizontal line indicates the confidence interval for each region derived from the null distribution. All epochs are 0-1000ms. B, Percent of neurons within each region that significantly encoded transition (p<0.05 clusterbased permutation test). C, D, Same as A, B, but for encoding of the interaction of previous reward and transition. E, ACC coefficients for transition and feedback were correlated (coefficients taken from 300 ms post onset of each epoch). In A, B, and D, solid horizontal line indicates periods where ACC was significantly greater than all three other regions (A: permutation test, B and D: chi-squared test, p<0.05/3).
Figure 4. Value estimate encoding was predominantly model-based A, Percentage of cells in each region that encoded the model-based (MB) derived estimates of the value of each of the choice 1 options. Solid horizontal lines indicate periods where the percentage of neurons was significant (p<0.05, binomial test). B, Same as in A, but for model-free (MF) derived estimates of each option's value. C, D, Same as in A, B but for the average CPD across neurons in a region. Dashed lines represent the 95% confidence interval determined by permutation testing. Solid lines indicate periods where the strength of encoding was significant (p<0.05, cluster-based permutation test). E, Distribution of the peak CPD values (i.e., the highest CPD value observed over the epoch shown in A-D for either PicA or PicB cues) for each neuron during the epoch shown in C (MB, blue) and D (MF, orange) values. Horizontal lines indicate median and extrema. *, ***, p<0.05,0.001 paired t-test. F, The percentage of neurons in each region that significantly encoded a MB-estimate of the chosen option's value (left), a MF estimate (middle), or both (right) assessed using cluster-length permutation testing. Coloured asterisks indicate that population is significantly greater than 10% (blue and orange) or 1% (green), binomial test. Asterisks between bars indicate a difference in size between the two populations (*, ***, p<0.05, 0.001, chi-square test).
Neural signatures of model-based and model-free reinforcement learning across prefrontal cortex and striatum
  • Preprint
  • File available

January 2025

·

6 Reads

·

James L Butler

·

·

[...]

·

Steven W Kennerley

Animals integrate knowledge about how the state of the environment evolves to choose actions that maximise reward. Such goal-directed behaviour - or model-based (MB) reinforcement learning (RL) - can flexibly adapt choice to changes, being thus distinct from simpler habitual - or model-free (MF) RL - strategies. Previous inactivation and neuroimaging work implicates prefrontal cortex (PFC) and the caudate striatal region in MB-RL; however, details are scarce about its implementation at the single-neuron level. Here, we recorded from two PFC regions - the dorsal anterior cingulate cortex (ACC) and dorsolateral PFC (DLPFC), and two striatal regions, caudate and putamen - while two rhesus macaques performed a sequential decision-making (two-step) task in which MB-RL involves knowledge about the statistics of reward and state transitions. All four regions, but particularly the ACC, encoded the rewards received and tracked the probabilistic state transitions that occurred. However, ACC (and to a lesser extent caudate) encoded the key variables of the task - namely the interaction between reward, transition and choice - which underlies MB decision-making. ACC and caudate neurons also encoded MB-derived estimates of choice values. Moreover, caudate value estimates of the choice options flipped when a rare transition occurred, demonstrating value update based on structural knowledge of the task. The striatal regions were unique (relative to PFC) in encoding the current and previous rewards with opposing polarities, reminiscent of dopaminergic neurons, and indicative of a MF prediction error. Our findings provide a deeper understanding of selective and temporally dissociable neural mechanisms underlying goal-directed behaviour.

Download

Navigating uncertainty: reward location variability induces reorganization of hippocampal spatial representations

January 2025

·

6 Reads

Navigating uncertainty is crucial for survival, with the location and availability of reward varying in different and unsignalled ways. Hippocampal place cell populations over-represent salient locations in an animal's environment, including those associated with rewards; however, how the spatial uncertainties impact the cognitive map is unclear. We report a virtual spatial navigation task designed to test the impact of different levels and types of uncertainty about reward on place cell populations. When the reward location changed on a trial-by-trial basis, inducing expected uncertainty, a greater proportion of place cells followed along, and the reward and the track end became anchors of a warped spatial metric. When the reward location then unexpectedly moved, the fraction of reward place cells that followed was greater when starting from a state of expected, compared to low, uncertainty. Overall, we show that different forms of potentially interacting uncertainty generate remapping in parallel, task-relevant, reference frames.


Figure 3: a. The effect of hierarchical memory structure on parsing search steps. b. Comparison between models in terms of sequence length after parsing. GT denotes the ground truth sequence length by the generative model. c. Model comparison based on sequence likelihood. d. Model comparison based on coding efficiency. Example model comparison with sequence length |S| = 1000, nested hierarchy depth d = 30, atomic set of size |A| = 10.
Figure 5: Comparison between human, variations of cognitive models, and AI models on the transfer block of the memory recall experiment. Bar plot shows the average human sequence recall time, and the sequence negative log-likelihood evaluated by the various cognitive and large language models.
Figure 7: Common adjacency and preadjacency structure to identify a variable.
Figure 8: Example process when parsing a chunk using parsing graph. Gray region denotes chunk overlaps, which are the non-leaf nodes inside the parsing graph being the common prefix of their children.
Figure 10: Regressing all models' sequence likelihood on human sequence recall time.
Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences

October 2024

·

61 Reads

Humans excel at learning abstract patterns across different sequences, filtering out irrelevant details, and transferring these generalized concepts to new sequences. In contrast, many sequence learning models lack the ability to abstract, which leads to memory inefficiency and poor transfer. We introduce a non-parametric hierarchical variable learning model (HVM) that learns chunks from sequences and abstracts contextually similar chunks as variables. HVM efficiently organizes memory while uncovering abstractions, leading to compact sequence representations. When learning on language datasets such as babyLM, HVM learns a more efficient dictionary than standard compression algorithms such as Lempel-Ziv. In a sequence recall task requiring the acquisition and transfer of variables embedded in sequences, we demonstrate HVM's sequence likelihood correlates with human recall times. In contrast, large language models (LLMs) struggle to transfer abstract variables as effectively as humans. From HVM's adjustable layer of abstraction, we demonstrate that the model realizes a precise trade-off between compression and generalization. Our work offers a cognitive model that captures the learning and transfer of abstract representations in human cognition and differentiates itself from the behavior of large language models.


Fig. 2 Performance on Psych-101. a, Pseudo-R 2 values for different models across experiments. A value of zero corresponds to prediction at chance level while a value of one corresponds to perfect predictability of human responses. Missing bars indicate performance below chance level. Centaur outperforms both Llama and a collection of domain-specific cognitive models in almost every experiment. Note that we only included experiments for which we have implemented a domain-specific cognitive model in this graphic and merged different studies using the same paradigm. A full table for all experiments can be found in the Supplementary Information. b, Model simulations on the two-step task. The plot visualizes probability densities over reward and a parameter indicating how model-based learning was for people and simulated runs of Centaur. c, Model simulations on the horizon task. The plot visualizes probability densities over reward and an information bonus parameter for both people and simulated runs of Centaur. d, Model simulations on a grammar judgement task. The plot visualizes probability densities over true and estimated scores (i.e., number of correct responses out of twenty) for both people and simulated runs of Centaur.
Centaur: a foundation model of human cognition

October 2024

·

500 Reads

·

2 Citations

Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. Here we introduce Centaur, a computational model that can predict and simulate human behavior in any experiment expressible in natural language. We derived Centaur by finetuning a state-of-the-art language model on a novel, large-scale data set called Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial data from over 60,000 participants performing over 10,000,000 choices in 160 experiments. Centaur not only captures the behavior of held-out participants better than existing cognitive models, but also generalizes to new cover stories, structural task modifications, and entirely new domains. Furthermore, we find that the model's internal representations become more aligned with human neural activity after finetuning. Taken together, Centaur is the first real candidate for a unified model of human cognition. We anticipate that it will have a disruptive impact on the cognitive sciences, challenging the existing paradigm for developing computational models.


Figure 3: Object decoding accuracy as a function of training set size, for contrastive models. CWM representations of objects become more linearly separable with dataset size, despite no architectural components that encourage the formation of object-centric representations. However, contrastive learning without next step prediction (CRL) does not give rise to object-centric representations, suggesting an important role for information provided by dynamic data. Scores are averaged over five seeds (three seeds in the MOVi domains), with error bars depicting standard error of the mean.
Figure 7: Alignment with slot models as determinted by representational similarity alignment (RSA) (Kriegeskorte et al., 2008; Kornblith et al., 2019). The representations of the contrastive model (CWM) become more aligned with its' slotted counterpart with more data.
Contrastive model hyperparameters.
Auto-encoder hyperparameters.
Sequential auto-encoder hyperparameters.
Next state prediction gives rise to entangled, yet compositional representations of objects

October 2024

·

48 Reads

Compositional representations are thought to enable humans to generalize across combinatorially vast state spaces. Models with learnable object slots, which encode information about objects in separate latent codes, have shown promise for this type of generalization but rely on strong architectural priors. Models with distributed representations, on the other hand, use overlapping, potentially entangled neural codes, and their ability to support compositional generalization remains underexplored. In this paper we examine whether distributed models can develop linearly separable representations of objects, like slotted models, through unsupervised training on videos of object interactions. We show that, surprisingly, models with distributed representations often match or outperform models with object slots in downstream prediction tasks. Furthermore, we find that linearly separable object representations can emerge without object-centric priors, with auxiliary objectives like next-state prediction playing a key role. Finally, we observe that distributed models' object representations are never fully disentangled, even if they are linearly separable: Multiple objects can be encoded through partially overlapping neural populations while still being highly separable with a linear classifier. We hypothesize that maintaining partially shared codes enables distributed models to better compress object dynamics, potentially enhancing generalization.


(Mal)adaptive Mentalising in the Cognitive Hierarchy, and Its Link to Paranoia

September 2024

·

56 Reads

·

1 Citation

Computational Psychiatry

Humans need to be on their toes when interacting with competitive others to avoid being taken advantage of. Too much caution out of context can, however, be detrimental and produce false beliefs of intended harm. Here, we offer a formal account of this phenomenon through the lens of Theory of Mind. We simulate agents of different depths of mentalizing within a simple game theoretic paradigm and show how, if aligned well, deep recursive mentalization gives rise to both successful deception as well as reasonable skepticism. However, we also show that if a self is mentalizing too deeply – hyper-mentalizing – false beliefs arise that a partner is trying to trick them maliciously, resulting in a material loss to the self. Importantly, we show that this is only true when hypermentalizing agents believe observed actions are generated intentionally. This theory offers a potential cognitive mechanism for suspiciousness, paranoia, and conspiratorial ideation. Rather than a deficit in Theory of Mind, paranoia may arise from the application of overly strategic thinking to ingenuous behaviour. Author Summary Interacting competitively requires vigilance to avoid deception. However, excessive caution can have adverse effects, stemming from false beliefs of intentional harm. So far there is no formal cognitive account of what may cause this suspiciousness. Here we present an examination of this phenomenon through the lens of Theory of Mind – the cognitive ability to consider the beliefs, intentions, and desires of others. By simulating interacting computer agents we illustrate how well-aligned agents can give rise to successful deception and justified skepticism. Crucially, we also reveal that overly cautious agents develop false beliefs that an ingenuous partner is attempting malicious trickery, leading to tangible losses. As well as formally defining a plausible mechanism for suspiciousness, paranoia, and conspiratorial thinking, our theory indicates that rather than a deficit in Theory of Mind, paranoia may involve an over-application of strategy to genuine behaviour.


Self-Other Generalisation Shapes Social Interaction and Is Disrupted in Borderline Personality Disorder

September 2024

·

251 Reads

Generalising information from ourselves to others, and others to ourselves allows for both a dependable source of navigation and adaptability in interpersonal exchange. Disturbances to social development in sensitive periods can cause enduring and distressing damage to lasting healthy relationships. However, identifying the mechanisms of healthy exchange has been difficult. We introduce a theory of self-other generalisation tested with data from a three-phase social value orientation task - the Intentions Game. We involved individuals with (n=50) and without (n=53) a diagnosis of borderline personality disorder and assessed whether self-other information generalisation may explain interpersonal (in)stability. Healthy controls initially used their preferences to predict others and were influenced by their partners, leading to self-other convergence. In contrast, individuals with borderline personality disorder maintained distinct self-other representations, generating a new neutral prior to begin learning. Both groups steadily reduced their updating over time, with healthy participants showing increased sensitivity to update beliefs. Furthermore, we explored theory-driven individual differences underpinning learning. Overall, the findings provide a clear explanation of how self-other generalisation constrains and assists learning, how childhood adversity disrupts this through separation of internalised beliefs and makes clear predictions about the mechanisms of social information integration under uncertainty.



Bayesian Priors in Active Avoidance

August 2024

·

22 Reads

Failing to make decisions that would actively avoid negative outcomes is central to helplessness. In a Bayesian framework, deciding whether to act is informed by beliefs about the world that can be characterised as priors. However, these priors have not been previously quantified. Here we administered two tasks in which participants decided whether to attempt active avoidance actions. The tasks differed in framing and valence, allowing us to test whether the prior generating biases in behaviour is problem-specific or task-independent and general. We performed extensive comparisons of models offering different structural explanations of the data, finding that a Bayesian model with a task-invariant prior for active avoidance provided the best fit to participants’ trial-by-trial behaviour. The parameters of this prior were reliable, and participants with an optimistic prior also reported higher levels of positive affect. These results show that individual differences in prior beliefs can explain decisions to engage in active avoidance of negative outcomes, providing evidence for a Bayesian conceptualization of helplessness.


Citations (58)


... 14 Being able to identify biases in cases of unreliable annotations is important, and researchers should resist the urge to withhold evaluable results from foundation models even if the data fail to reject a null hypothesis. By performing more rigorous evaluations, researchers could crowdsource measuring model biases and behavior tendencies to help all users be more discerning of speciousness, especially as these models' poor behaviors get harder to detect (Azaria et al., 2024;Hosking et al., 2024;Zhou et al., 2024) and as researchers make bolder claims about their abilities (see Binz et al. 2024, inter alia). ...

Reference:

"All that Glitters": Approaches to Evaluations with Unreliable Model and Human Annotations
Centaur: a foundation model of human cognition

... More recently, deep learning based keypoint detection has become popular [1,3,10,14,19,39,46,48]. DeepLab-Cut [1] detects manually specified keypoints through transfer learning of 2D human pose estimation from monocular images [40]. ...

Lightning Pose: improved animal pose estimation via semi-supervised learning, Bayesian ensembling and cloud-native open-source tools
  • Citing Article
  • June 2024

Nature Methods

... For the Living Brain Project, a safe and scalable procedure was developed to acquire PFC tissue from living people for biomedical research purposes (Figure 1) [14][15][16][17] . Three previous reports on data or samples obtained for the LBP have made the following contributions: (1) using bulk RNA sequencing (bulk RNA-seq; i.e., the capture and quantification of pooled RNA from the cells of a sample), differences in RNA transcript expression between living and postmortem human PFC tissues were characterized and analyses were presented that investigated whether these differences impact the findings of RNA transcript expression studies that only use postmortem tissues 14 ; (2) using single-nucleus RNA sequencing (snRNA-seq; i.e., the capture and quantification of RNA from many individual nuclei in a tissue sample), the differences in RNA transcript expression between living and postmortem human PFC tissues were dissected and an approach to statistically account for these differences in analyses of data from postmortem samples was developed 17 ; (3) using intracranial measures of neurotransmission obtained while participants played a computer game that engaged higher-order brain functions, patterns of dopamine and serotonin fluctuation in the basal ganglia were identified that associated with social cognition 16 . ...

Dopamine and serotonin in human substantia nigra track social context and value signals during economic exchange

Nature Human Behaviour

... That is, a strong correlation between TD errors and the amount of dopamine or the firing rate of dopamine neurons, which affect memory and learning in organisms, has been reported (Schultz et al., 1993;O'Doherty et al., 2003;Starkweather and Uchida, 2021), and behavioral learning in organisms is also hypothesized to be based on RL (Dayan and Balleine, 2002;Doya, 2021). Recently, more detailed investigation of the relationship between TD errors and dopamines has revealed that it is not a simple linear relationship as suggested by the standard TD learning, but is biased and nonlinear (Dabney et al., 2020;Muller et al., 2024). It has also been reported that some of nonlinearities may stabilize learning performance (Hoxha et al., 2024). ...

Distributional reinforcement learning in prefrontal cortex

Nature Neuroscience

... Our approach in modeling behavior aims to descriptively characterize the relatively long time-scale dynamics of learning that would be required to correctly associate stimuli, actions, and outcomes, particularly in the absence of shaping, de-biasing, or other experimental protocols. This relates to previous modeling efforts of similar datasets; however, instead of focusing on trial-to-trial fluctuations in psychophysical weights 82 or the emergence of multi-state behavior 83 , we focus on session-level changes in psychophysical weights. We leveraged advances in MCMC [84][85][86] to infer a set of parameters and weights for Bernoulli generalized linear models (GLM) that were expressive enough to capture the full set of behaviors that mice in our task explored. ...

Dissecting the Complexities of Learning With Infinite Hidden Markov Models

... CATs are both personalized and process-oriented, as are the therapeutic relationship and transference/ countertransference dynamics, thus considering subjective individual differences in aesthetic experiences, which differ significantly between individuals and within an individual over time (Brielmann et al., 2024). Intertwined individual factors related to personality, age, gender, physiological features, personal and sociocultural background, experience, preferences, and motivation, as well as contextual and situational factors tune the predictive apparatus toward salient objects and experiences (Kesner, 2014;Spee et al., 2022). ...

Modelling individual aesthetic judgements over time

... These do not capture the full range of defensive responses but do offer advantages in terms of the molecular and circuitlevel neurobiological techniques that can be used in animal models. Thus, the authors argue for the necessity of a multilevel approach to improve this translational aspect of neuroscience, ranging from animal models to examine fear and anxiety neurobiology to computational models to connect studies across species (Neville et al., 2023). They also advocate for the use of specialised models that share specific characteristics with humans to overcome some limitations of standard nonhuman animal models. ...

A primer on the use of computational modelling to investigate affective states, affective disorders and animal welfare in non-human animals

Cognitive Affective & Behavioral Neuroscience

... This begs the 41 question as to whether ToM changes in paranoia and paranoia-affiliated diagnoses may 42 be caused by changes in the maladaptive application of recursive cognition in social 43 settings. There has been little work examining the role of cognitive recursion applied to 44 BPD, paranoia, and persecutory delusions, aside from some notable exceptions [27,42], 45 which did not focus on false belief generation or maintenance. 46 Here, we offer an example of the ramifications of being adaptively and maladaptively 47 strategic at different recursive levels. ...

Between prudence and paranoia: Theory of Mind gone right, and wrong
  • Citing Preprint
  • October 2023

... As is known, repeated presentation of the same stressor often results in adaptation and may decrease depressive and anxiety-like behaviors [22,23], a process that can be circumvented by applying a range of stressors in an unpredictable order [1]. Besides, previous research has demonstrated that chronic, uncontrollable stress can impair the brain's reward system [24]. Thus, the chronic stress animal model was developed to investigate neuropathology [25] and potential therapeutic targets [26,27] of depressive disorder. ...

Anxiety associated with perceived uncontrollable stress enhances expectations of environmental volatility and impairs reward learning