Christopher Summerfield’s research while affiliated with University of Oxford and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (223)


Figure 1: Violin plot of exhaustive score distributions to the "greatest thing" prompt. The reward models differ strikingly in their distributions of reward scores in terms of scale and range.
Reward Model Interpretability via Optimal and Pessimal Tokens
  • Preprint
  • File available

June 2025

Brian Christian

·

Hannah Rose Kirk

·

Jessica A. F. Thompson

·

[...]

·

Tsvetomira Dumbalska

Reward modeling has emerged as a crucial component in aligning large language models with human values. Significant attention has focused on using reward models as a means for fine-tuning generative models. However, the reward models themselves -- which directly encode human value judgments by turning prompt-response pairs into scalar rewards -- remain relatively understudied. We present a novel approach to reward model interpretability through exhaustive analysis of their responses across their entire vocabulary space. By examining how different reward models score every possible single-token response to value-laden prompts, we uncover several striking findings: (i) substantial heterogeneity between models trained on similar objectives, (ii) systematic asymmetries in how models encode high- vs low-scoring tokens, (iii) significant sensitivity to prompt framing that mirrors human cognitive biases, and (iv) overvaluation of more frequent tokens. We demonstrate these effects across ten recent open-source reward models of varying parameter counts and architectures. Our results challenge assumptions about the interchangeability of reward models, as well as their suitability as proxies of complex and context-dependent human values. We find that these models can encode concerning biases toward certain identity groups, which may emerge as unintended consequences of harmlessness training -- distortions that risk propagating through the downstream large language models now deployed to millions.

Download

Understanding human meta-control and its pathologies using deep neural networks

April 2025

·

1 Read

In mammals, neurons in the medial prefrontal cortex respond to action prediction errors (APEs). Here, using computational simulations with deep neural networks, we show the that this error monitoring process is crucial for inferring how controllable an environment is, and thus for estimating the value of control processes (meta-control). We trained both humans and deep reinforcement learning (RL) agents to perform a reward-guided learning task that required adaptation to changes in environmental controllability. Deep RL agents could only solve the task when designed to explicitly predict APEs, and when trained this way, they displayed signatures of meta-control that closely resembled those observed in humans. Moreover, when deep RL agents were trained to over- or under-estimate controllability, they developed behavioural pathologies matching those of humans who reported depressive, anxious or compulsive traits on transdiagnostic questionnaires. These findings open up new avenues for studying both healthy and pathological meta-control using deep neural networks.


Figure 2: Happiness boost after AI chatbot conversations compared to journaling on emotional topics A) Happiness after conversations with AI chatbots (orange) or journaling (red). Happiness ratings range from 0 (very unhappy) to 100 (very happy), with 50 indicating a neutral state. Emotional topics are ordered by average happiness for the journaling condition. B) Differences in happiness between AI chatbot and journaling for each topic. C) The largest AI chatbot happiness boost (within-subjects) occurs for the most negative topics. Each of a participant's three chatbot conversations were categorized (best, middle, and worst) based on the average journaling happiness for those topics. The average difference between happiness after the chatbot conversation and the corresponding happiness for the journal condition are shown. D) The largest AI chatbot happiness boost (between-subjects) is for negative topics. Topics are grouped into positive (>50) or negative (<50) topics based on average journal happiness. The x-axis shows the participants' average happiness for each topic group. All error bars reflect ± 1 SEM. *p < .05, **p < .01, ***p < .001
Figure 3: Sentiment ratings for AI chatbot and user conversations. A) Comparison of average sentiment between AI chatbots and users across conversations. Here, 'users' refers to the participants in the AI chatbot conversations. External LLMs rated the conversation-level chatbot and user for each conversation and were averaged per user. The x-axis shows the average user sentiment from 0 (very negative) to 10 (very positive) across conversation while the y-axis shows the average chatbot sentiment. The chatbot exhibited a more positive sentiment than the user in 90% of cases. B) Relationship between AI chatbot and user sentiment across different topics. Linear regressions were conducted to predict the chatbot's sentiment based on user's sentiment. The x-axis shows the resulting beta coefficient while the y-axis lists the topic labels. C) Independent influence of AI chatbot and user sentiment on post-conversation happiness. A mixedeffects regression, controlling for each sentiment variable, was used to predict post-conversation happiness. All error bars reflect ±1 SEM. * * p < .01, * * * p < .001.
Figure 4: Bidirectional relationship between chatbot and user sentiment within conversations A) Sentiment increases within conversations for both chatbot and users. Here, 'users' refers to the participants in the AI chatbot conversations. The x-axis represents the sequential utterance pair number, marking the conversation progression through pairs of alternating user and chatbot responses. The y-axis displays sentiment scores. The histogram at the top shows the relative frequency of utterance pairs. B) Change in user sentiment from first to last utterance. The x-axis shows the difference in sentiment rating between the last and first user utterance, organized by topic (y-axis). C) Bidirectional influences between chatbot and user sentiment. The graph shows beta coefficients from two cross-lagged regressions examining how prior sentiment of one party influences the current sentiment of the other for both user and chatbot. D) Sentiment mirroring changes across topics. Two cross-lagged regressions described in panel C were fitted for each topic separately. We calculated the relative importance of each predictor (prior chatbot or user sentiment) on current sentiment (chatbot or user). The x-axis shows relative importance as a percentage, while the y-axis lists topics. All error bars reflect ±1 SEMs and * p < .05, * * p < .01, * * * p < .001.
Increasing happiness through conversations with artificial intelligence

April 2025

·

90 Reads

Chatbots powered by artificial intelligence (AI) have rapidly become a significant part of everyday life, with over a quarter of American adults using them multiple times per week. While these tools offer potential benefits and risks, a fundamental question remains largely unexplored: How do conversations with AI influence subjective well-being? To investigate this, we conducted a study where participants either engaged in conversations with an AI chatbot (N = 334) or wrote journal entires (N = 193) on the same randomly assigned topics and reported their momentary happiness afterward. We found that happiness after AI chatbot conversations was higher than after journaling, particularly when discussing negative topics such as depression or guilt. Leveraging large language models for sentiment analysis, we found that the AI chatbot mirrored participants' sentiment while maintaining a consistent positivity bias. When discussing negative topics, participants gradually aligned their sentiment with the AI's positivity, leading to an overall increase in happiness. We hypothesized that the history of participants' sentiment prediction errors, the difference between expected and actual emotional tone when responding to the AI chatbot, might explain this happiness effect. Using computational modeling, we find the history of these sentiment prediction errors over the course of a conversation predicts greater post-conversation happiness, demonstrating a central role of emotional expectations during dialogue. Our findings underscore the effect that AI interactions can have on human well-being.


Illustration of the game
A Two rounds (denoted t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}) are illustrated (columns). In each, a mechanism allocates resources (blue flowers) from a pool of size R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R$$\end{document} to p=4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=4$$\end{document} players, who each choose a quantity to reciprocate, with any remainder going to surplus (gold coins). For example, in the schematic, in round t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document} the first player (left) receives 2 flowers, and reciprocates 1, generating a surplus of r=0.4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=0.4$$\end{document} (amount due to growth factor shown in grey). The pool size is depleted by the allocation and replenished by the reciprocations. Note that players who receive no resources cannot reciprocate (e.g. centre left player on round t) (B) Illustration of our approach. First, (1) we collected data from human participants under a range of mechanisms defined by different values of w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document}, and used imitation learning to create clones that behaved like people. Then (2) we used these clones to train the RL agent, and (3) conducted Exp.1, in which we compared the RL agent to baselines. Next, (4) we analysed the RL agent policy, and constructed a heuristic approximation that was more explainable (the ‘interpolation baseline’), which (5) we tested on behavioural clones, and (6) compared to the RL agent (and proportional mechanism) in Exp. 2. Finally, (7) we used all of the data so far to retrain a new version of the RL agent, and (8) compared it to the interpolation baseline in Exp. 3. C Example games (using behavioural clones). The game starts with R0=200\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}_{0}=200$$\end{document}. Left: offers (full lines) and reciprocations (dashed lines) to four players (lower panels) over 40 trials (x-axis) in an example game with the equal baseline. The grey shaded area indicates where each player receives an offer of zero. The top panel shows the size of the pool (blue) and the total per-trial surplus (red). The middle and right panels show example games under the proportional baseline and the RL agent, respectively. Note that in the example proportional baseline game, three players fall into poverty traps, leaving a single player to contribute, and increasing inequality.
Results of first trained mechanism against baselines
A The surplus and Gini coefficient generated from games played under three baseline mechanisms (blue, purple and red) and the RL agent (green), for virtual players (left panel) and human participants (right panel) in Exp. 1. Each small dot is a game; the larger dot is the mean over games. B Correspondences between predicted outcomes (from virtual players, shading) and observed outcomes in Exp.1 (dots) for each baseline mechanism and the RL agent. Shown separately for games that were sustained to the end by at least one player (colours) and those where the pool was exhausted prematurely (grey). C The average Gini coefficient of the offer made to players as a function of the pool size, for individual trials (grey dots) and games (coloured dots), both for behavioural clones (upper panels) and human data in Exp.1 (lower panels). D Exclusions occur when a player receives nothing for one or more consecutive rounds. Here, we plot the duration of exclusions against the trial on which they are initiated (dots). Points on the diagonal indicate that the player was never reincluded (exclusion lasts until trial 40). The superimposed coloured histograms show the count of exclusions for each duration bin (of width 2). Note that unlike baselines the RL agent excludes frequently, but for short durations. E The offer made by each mechanism to each player as a function of the lagged contribution of that player over adjacent trials. Dots are individual coefficients; black line is the median.
Results of first trained mechanisms against novel, interpretable mechanism
A The family of exponential functions that determine w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document} as a function of pool size (grey lines) and the one that produced the highest surplus with virtual players (yellow line; (R/200)22\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${(R/200)}^{22}$$\end{document}). B The surplus and Gini coefficient generated from games played under two baseline mechanisms (proportional, red; interpolating, yellow) and the RL agent (green), for virtual players (left panel) and human participants (right panel) in Exp. 2. Each small dot is a game; the larger dot is the mean over games. C Same as Fig. 2C but for Proportional, Interpolating and RL agents. D Same as Fig. 2E. E Same as Fig. 2D.
Results of second trained mechanism
A The surplus and Gini coefficient generated from games played under the interpolating baseline (yellow) and the RL agent M2 (light green), for virtual players (left panel) and human participants (right panel) in Exp. 3. Each small dot is a game; the larger dot is the mean over games. B Average reported preference for the interpolating (yellow) and RL agent M2 (green) agents on a number of dimensions. C Same as A; Results from human participants in Exp. 4., in which groups of players play three consecutive games.
Deep reinforcement learning can promote sustainable human behaviour in a common-pool resource problem

March 2025

·

22 Reads

·

1 Citation

A canonical social dilemma arises when resources are allocated to people, who can either reciprocate with interest or keep the proceeds. The right resource allocation mechanisms can encourage levels of reciprocation that sustain the commons. Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design a social planner that promotes sustainable contributions from human participants. We first trained neural networks to behave like human players, creating a stimulated economy that allows us to study the dynamics of receipt and reciprocation. We use RL to train a mechanism to maximise aggregate return to players. The RL mechanism discovers a redistributive policy that leads to a large but also more equal surplus. The mechanism outperforms baseline mechanisms by conditioning its generosity on available resources and temporarily sanctioning defectors. Examining the RL policy allows us to develop a similar but explainable mechanism that is more popular among players.


Humans and neural networks show similar patterns of transfer and interference during continual learning

February 2025

·

1 Read

Learning multiple tasks in succession is a challenge for artificial and biological agents alike. However, it is often claimed that artificial agents fail to learn new tasks without overwriting previous ones, while humans succeed. Here, using a canonical sequential rule-learning task in which learners acquire Task~A, then learn Task~B, and are finally re-tested on Task~A, we see conserved patterns of transfer and interference in humans and neural networks. When successive tasks are similar (compared to dissimilar), both types of learner benefit more from transferring prior knowledge to Task~B, but demonstrate greater interference when retested on Task~A. Examining the hidden representations in neural networks, we observe this interference arises because networks utilise their existing solutions to accelerate learning of similar tasks, whilst corrupting those solutions for their previous use. In humans, we also observe striking individual differences where some participants (`lumpers') show interference after sequentially learning two similar tasks, while others (`splitters') avoid interference. These two strategies are associated with opposite patterns of performance benefits: `lumpers' are better at learning shared structure across stimuli and generalising to new settings, while `splitters' are better at remembering unique features of the stimuli. By varying the training regime for neural networks (`rich' versus `lazy'), we can recreate these differences, pushing networks towards low-dimensional representations which capitalise on shared structure, or high-dimensional representations maximising discrimination between inputs. Overall, our findings reveal shared computational principles relating transfer and interference in both systems, governed by global factors like task similarity and individual differences in structuring knowledge.


Figure 3: Consensus Evaluation. Either one participant's critique (top), or all participants' critiques (bottom) are substituted with critiques sampled from their respective digital representativesˆπrepresentativesˆ representativesˆπ i . Left: Mean discrepancy in payoffs between the revised consensus generated by the mediator mechanism using participants' ground-truth critiques, versus using critiques sampled from DRs. Replacing either one or all ground truth critiques with vanilla DR samples results in significantly degraded payoffs, whereas fine-tuned DR samples yield payoffs with little or no degradation, indicating that those consensuses are more or less equivalent. Right: The autorater's win-rate of a generated revised consensus (using critiques sampled from different modelsˆπmodelsˆ modelsˆπ i ) against the revised consensus generated using ground-truth critiques (i.e. the baseline). The golden bar represents the baseline consensus against itself, serving as a reference for ceiling performance. Substituting a single critique (chosen at random) results in consensuses perceived as roughly equally similar across all DRs by the autorater (13% difference between ceiling and Vanilla 1B DRs). This is likely a property of the mediation mechanism, which we empirically observe disregards outlier critiques. However, substituting the entire group's critiques significantly influences the revised consensus output (30% difference between ceiling and Vanilla 1B DRs), with the 30B fine-tuned DRs having a notably higher win-rate here. Conclusion: Using the notion of representativity motivated earlier, in both top and bottom panels we also observe that fine-tuning and scale both improve the representativity of DRs for held-out episodes. where payoffs are estimated using a payoff model accompanying the work in [5, 6], which is a model based on a 1B-parameter Chinchilla [22] that outputs a scalar "agreement" score 4 for each participant:
Figure 4: Variants of Digital Representatives. Mean log-likelihood of ground-truth critiques from human participants (from the validation set), evaluated under various digital representatives. All these DRs have 1B parameters and were fine-tuned on datasets conditioned on diverse additional information, as indicated on the x-axis (refer to Table 2 for details and examples). The optimal variant (Base+O+C) incorporates participant i's opinions and critiques to other questions. This variant performs very similarly to its counterpart that additionally includes demographic information (Base+D+O+C). However, we opted for the former for simplicity. Note that variants based solely on demographics (Base+D or position scoring (Base+P) perform much worse. This suggests that integrating participant-specific few-shot information enhances both task-and self-consistency.
Language Agents as Digital Representatives in Collective Decision-Making

February 2025

·

23 Reads

Consider the process of collective decision-making, in which a group of individuals interactively select a preferred outcome from among a universe of alternatives. In this context, "representation" is the activity of making an individual's preferences present in the process via participation by a proxy agent -- i.e. their "representative". To this end, learned models of human behavior have the potential to fill this role, with practical implications for multi-agent scenario studies and mechanism design. In this work, we investigate the possibility of training \textit{language agents} to behave in the capacity of representatives of human agents, appropriately expressing the preferences of those individuals whom they stand for. First, we formalize the setting of \textit{collective decision-making} -- as the episodic process of interaction between a group of agents and a decision mechanism. On this basis, we then formalize the problem of \textit{digital representation} -- as the simulation of an agent's behavior to yield equivalent outcomes from the mechanism. Finally, we conduct an empirical case study in the setting of \textit{consensus-finding} among diverse humans, and demonstrate the feasibility of fine-tuning large language models to act as digital representatives.


Flexible task abstractions emerge in linear networks with fast and bounded units

January 2025

·

6 Reads

·

1 Citation

Animals survive in dynamic environments changing at arbitrary timescales, but such data distribution shifts are a challenge to neural networks. To adapt to change, neural systems may change a large number of parameters, which is a slow process involving forgetting past information. In contrast, animals leverage distribution changes to segment their stream of experience into tasks and associate them with internal task abstracts. Animals can then respond flexibly by selecting the appropriate task abstraction. However, how such flexible task abstractions may arise in neural systems remains unknown. Here, we analyze a linear gated network where the weights and gates are jointly optimized via gradient descent, but with neuron-like constraints on the gates including a faster timescale, nonnegativity, and bounded activity. We observe that the weights self-organize into modules specialized for tasks or sub-tasks encountered, while the gates layer forms unique representations that switch the appropriate weight modules (task abstractions). We analytically reduce the learning dynamics to an effective eigenspace, revealing a virtuous cycle: fast adapting gates drive weight specialization by protecting previous knowledge, while weight specialization in turn increases the update rate of the gating layer. Task switching in the gating layer accelerates as a function of curriculum block size and task training, mirroring key findings in cognitive neuroscience. We show that the discovered task abstractions support generalization through both task and subtask composition, and we extend our findings to a non-linear network switching between two tasks. Overall, our work offers a theory of cognitive flexibility in animals as arising from joint gradient descent on synaptic and neural gating in a neural network architecture.


Flexible task abstractions emerge in linear networks with fast and bounded units

November 2024

·

10 Reads

Animals survive in dynamic environments changing at arbitrary timescales, but such data distribution shifts are a challenge to neural networks. To adapt to change, neural systems may change a large number of parameters, which is a slow process involving forgetting past information. In contrast, animals leverage distribution changes to segment their stream of experience into tasks and associate them with internal task abstracts. Animals can then respond flexibly by selecting the appropriate task abstraction. However, how such flexible task abstractions may arise in neural systems remains unknown. Here, we analyze a linear gated network where the weights and gates are jointly optimized via gradient descent, but with neuron-like constraints on the gates including a faster timescale, nonnegativity, and bounded activity. We observe that the weights self-organize into modules specialized for tasks or sub-tasks encountered, while the gates layer forms unique representations that switch the appropriate weight modules (task abstractions). We analytically reduce the learning dynamics to an effective eigenspace, revealing a virtuous cycle: fast adapting gates drive weight specialization by protecting previous knowledge, while weight specialization in turn increases the update rate of the gating layer. Task switching in the gating layer accelerates as a function of curriculum block size and task training, mirroring key findings in cognitive neuroscience. We show that the discovered task abstractions support generalization through both task and subtask composition, and we extend our findings to a non-linear network switching between two tasks. Overall, our work offers a theory of cognitive flexibility in animals as arising from joint gradient descent on synaptic and neural gating in a neural network architecture.



Do humans learn like transformer networks?

October 2024

·

29 Reads

Do humans learn like transformers? We trained both humans and transformer networks on a rule learning task where they had to respond to a query at the end of a sequence of symbols (the context). At test, we measured both “in-context” learning (the ability to generalise the rule to novel queries) and “in-weights” learning (the recall of past experiences from memory). Manipulating the diversity and redundancy of examples in the training distribution, we found that humans and transformers respond in very similar ways. In both types of learner, redundancy and diversity trade off in driving in-weights and in-context learning respectively, whereas a composite distribution that includes a balanced mix of redundancy and diversity allows the two strategies to be used in tandem. However, we also found that while humans benefit from dynamic training schedules that emphasise diverse examples early on, transformers do not. So, whilst the same data distributional properties promote learning in humans and transformer networks, only people benefit from curricula.


Citations (48)


... This approach not only improves flexibility and speeds up adaptation, but also protects the underlying feature extraction from drastic changes, mitigating catastrophic forgetting. However, whether this overall approach can be extended to high-dimensional, image-based tasks -such as those inspired by the WCST -remains to be seen (Sandbrink et al., 2025;Bellman, 1961). ...

Reference:

Sparks of cognitive flexibility: self-guided context inference for flexible stimulus-response mapping by attentional routing
Flexible task abstractions emerge in linear networks with fast and bounded units

... Breakthrough capabilities often require compositional reasoning (Srivastava et al., 2023;Löwe et al., 2024;Chen et al., 2024); we consider the compositional property of length generalization (Graves et al., 2016;Kaiser & Sutskever, 2015;Lake & Baroni, 2018;Hupkes et al., 2020). Below we outline our experimental setup, with further details in Appendix A. Architecture: In our synthetic experiments, we train decoder-only Transformer models from scratch using rotary position embeddings (RoPE) . ...

Abrupt and spontaneous strategy switches emerge in simple regularised neural networks

... The debate over AI regulation in political communication is integral to this cyclical progression. While well-designed AI tools can foster a more inclusive discourse and help groups find common ground, their adoption raises significant concerns regarding transparency and accountability (Tessler et al., 2024). The transparent use of AI-generated content may build trust, but poorly designed or undisclosed systems risk eroding public confidence (Kreps & Jakesch, 2023). ...

AI can help humans find common ground in democratic deliberation
  • Citing Article
  • October 2024

Science

... These experiences foster essential building blocks of abstract thinking, including pattern recognition, spatial reasoning, and causal inference [8,9]. In cognitive science, games are used as experimental platforms to reveal the inductive biases of the human mind [2,3], such as planning depth in the game Four-in-a-Row [63], or the cognitive basis of tool use through the game Virtual Tools [4]. ...

Using games to understand the mind
  • Citing Article
  • June 2024

Nature Human Behaviour

... This formulation connects the execution of desired actions to the achievability of desired outcomes. Specifically, our formalism closely matches the original conceptualization of self-efficacy in psychology 26 , and the equivalents of our noise probability ϵ have indeed been used previously as operationalizations of selfinefficacy in experimental studies 18,80 . Theorem 2 below shows that, for all ℓ, the perceived degree-ℓ empowerment of state s indeed decreases by increasing the probability ϵ of the execution noise. ...

Modelling cognitive flexibility with deep neural networks
  • Citing Article
  • June 2024

Current Opinion in Behavioral Sciences

... By contrast, higher-order processes -such as abstract categorization (Cheadle et al., 2014;Wyart et al., 2015Wyart et al., , 2012) -or rule discovery (Maheu et al., 2022) -appear to require symbolic representations that are detached from the sensory features on which they are constructed. These symbolic representations enable cognitive transformations -such as compression (Al Roumi et al., 2021), reversal (Xie et al., 2022), reordering (Albouy et al., 2022, or the transfer of temporal structure across contexts (Garvert et al., 2017;Lerousseau and Summerfield, 2024;Whittington et al., 2020). Our task formally tests this distinction through two competing hypotheses: ...

Space as a Scaffold for Rotational Generalisation of Abstract Concepts
  • Citing Preprint
  • January 2024

... We conjectured that compositional generalization of structural knowledge could be enabled by a previously described mechanism that proposed abstraction of structural representations from sensory experiences within the hippocampal-entorhinal system for efficient generalization to novel situations (Behrens et al., 2018;Whittington et al., 2020Whittington et al., , 2022. In other words, knowledge generalization involves mapping of inferred generative causes of an ongoing experience onto a structural scaffolding that supports abstractions over concrete events Pesnot Lerousseau & Summerfield, 2024;Whittington et al., 2020). We propose that humans identify, neurally represent, and abstract the constitutive relational units of experience ("building blocks"). ...

Space as a Scaffold for Rotational Generalisation of Abstract Concepts
  • Citing Preprint
  • January 2024

eLife

... In this paper, we trained artificial recurrent neural networks (RNNs) to perform a color delayed-response task, where the color in each trial was sampled from a prior distribution with a few high-probability colors (common colors) [9,[25][26][27]. We found that the trained RNNs exhibited smaller memory errors on common colors, which aligns with previous behavioral experiments [1]. ...

A recurrent neural network model of prefrontal brain activity during a working memory task

... The dorsolateral prefrontal cortex (DLPFC) has been proposed as a region that encodes policies [11,12] and supports contextdependent action [13][14][15][16][17]. Based on this literature, DLPFC is a candidate region in which the encoding of successful past policies could be detected when people are exposed to new situations. The medial temporal lobe (MTL) and orbitofrontal cortex (OFC) have been proposed as regions that encode predictive representations about future states [18][19][20][21][22]. Based on this literature, MTL and OFC are candidate regions in which features associated with successful past policies might be detected. ...

Goal-seeking compresses neural codes for space in the human hippocampus and orbitofrontal cortex
  • Citing Article
  • September 2023

Neuron

... Several factors could have a strong impact on whether irrelevant stimuli affect task performance. For example, levels of neural noise in the circuit as well as energy constraints and the metabolic costs of overall neural activity (Flesch et al., 2022;Whittington et al., 2022;Sussillo et al., 2015;Orhan and Ma, 2019;Löwe, 2023;Cueva and Wei, 2018;Luo et al., 2023;Kao et al., 2021;Deneve et al., 2001; Barak et al., 2013) can affect how stimuli are represented in a neural circuit. Indeed, both noise and metabolic costs are factors that biological circuits must contend with (Tomko and Crapper, 1974;Laughlin, 2001;Churchland et al., 2006;Hasenstaub et al., 2010). ...

Regularised neural networks mimic human insight
  • Citing Conference Paper
  • January 2023