Timothy H. Muller’s research while affiliated with University of Oxford and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (18)


Timeseries demonstrating value-related variance
The mean across neurons of the coefficient of partial determination (CPD) for value (cued probability) over time, following cue onset. The CPD measures how much variance in each neuron’s firing is explained by a given regressor (see below). This timeseries validates the 200–600 ms post-onset window that we used in order to match that used in Dabney, Kurth-Nelson et al.¹⁵, because that window very closely matches the peak value-related coding in the timeseries (as in Kennerley et al.⁸). We therefore used 200–600 ms and did not try any other time windows in order to avoid any possible p-hacking. Nonetheless we note that in a window that captures this peak, defined as when the CPD is higher than two thirds of the maximum CPD (270–620 ms), the core correlation between reversal point and asymmetric scaling was significant, R = 0.38, P = 0.019. Shaded region is the SEM across neurons. Note, as in Kennerley et al.⁸, the CPD for regressor Xi is defined by CPD(Xi)=SSEX−i−SSEX−i,Xi/SSEX−i\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{CPD}(X}_{i})=\left[{SSE}\left({X}_{-i}\right)-{SSE}\left({X}_{-i},{X}_{i}\right)\right]/{SSE}\left({X}_{-i}\right)$$\end{document}, where SSE(X) is the sum of squared errors in a regression model that includes a set of regressors X, and X−i\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${X}_{-i}$$\end{document} is a set of all the regressors included in the model except Xi. The CPD for Xi is more positive if Xi explains more variance in neuronal firing.
Reason for Z-scoring to fit the reversal points
a–c) Three example neurons’ firing rate plotted as a function of time since cue onset, and split according to the four value levels, showing that some neurons increase their firing relative to baseline pre-stimulus firing rate for all reward levels (a), others increase or decrease it depending on the reward level (b) and others decrease it for all reward levels (c). Shaded regions denote SEM. The reason for Z-scoring in our data is as follows. In dopamine neurons, it appears that any firing rate deviation from baseline activity (that is, pre-stimulus onset activity) is signalling a reward prediction error. This is not true for cortical neurons, which may, for example, increase (as in a) or decrease (c) their firing to all reward levels. If this is the case, then deviation from baseline cannot be assumed to denote an RPE. That some neurons either increase or decrease their firing to all reward levels is indicative of the heterogenous coding schemes evident in PFC neurons. Given this, we can isolate the component of the activity that is associated with RPE by calculating z-scores, and using deviation from mean firing to capture the same effect and compute reversal points. Therefore our reversal point measure captures, for each neuron, the relative differences in responses to different reward levels (that is, the non-linearity) that indicates optimism, rather than being affected by overall shifts in firing. These reversal points, that is, the value at which the neuron firing reverses from below to above the mean firing in the epoch, are an index of neuron optimism; the higher the reversal point, the more optimistic the neuron, and neurons with reversal points above 2.5 are optimistic and below 2.5 are pessimistic (Methods).
Results hold with a different measure of non-linearity at choice
a) Histogram showing diverse quadratic betas. b) Histogram showing the log p-values for consistency of these quadratic betas across partitions and the corresponding geometric mean. c) Pearson correlation between asymmetric scaling and quadratic betas.
Simultaneous diversity within session
Four simultaneously recorded cells from the session with most reward-sensitive cells (9 in total), demonstrates there is diversity in optimism even within a session. Across cells, responses to middle value levels are both above and below the linear interpolation between lowest and highest values’ responses. Mean normalised firing is plotted for each of the 4 value levels. Error bars denote SEM. Firing rates are normalised such that responses to value 1 and 4 have mean firing rate 0 and 1, respectively. Normalisation allows comparison across cells of responses to middle value levels. Responses to value 2 across the 9 simultaneously recorded cells were significantly diverse; ANOVA rejected the null hypothesis that across cells the value 2 responses were drawn from the same mean (F(8,405) = 3.56, P = 0.0005). The same was true for responses to value 3 (F(8,441) = 2.16, P = 0.0291). This diversity was also present when including all cells in the analysis (value 2: F(40,1658) = 3.82, P = 2.74 × 10⁻¹⁴, and value 3: F(40,1842) = 4.73, P = 4.99 × 10⁻²⁰), and in individual subjects (first animal: value 2; F(11,516) = 3.18, P = 0.0006, value 3; F(10,520) = 3.61, P = 0.0001; second animal: value 2; F(29,1142) = 3.92, P = 2.6 × 10⁻¹¹ value 3; F(29,1322) = 5.47, P = 1.7 × 10⁻¹⁸).
Lack of diversity in OFC and LPFC
We additionally ran analyses in all reward selective neurons (as opposed to only RPE-selective neurons) in OFC and LPFC as an exploratory analysis to assess whether consistent diversity was present when more neurons entered the analysis, since these regions have a smaller proportion of RPE-selective neurons compared to ACC. Same analysis as in Fig. 1, but for OFC and LPFC on all reward-selective neurons or RPE selective neurons. We applied exactly the same criteria and analyses to these brain regions as we did in ACC. As in Fig. 1, we computed the Pearson correlation for each of 1000 independent data partitions, and calculate the mean and geometric mean of the R and p-values, respectively. The coloured (left) histograms are the distributions of the reversal points, and the grey (right) histograms are the log(p-values) from the correlations. a) OFC reward-selective neurons. b) OFC RPE-selective neurons. c) LPFC reward-selective neurons. d) LPFC RPE-selective neurons. With the exception of the reward-selective neurons in OFC (a), none of these analyses were significant. Moreover, when we compared the diversity of these reward-selective neurons in OFC (a) across stimulus set (that is, Fig. 1e analysis), the correlation between stimulus set 1 and 2 was not significant (R = 0.15; P = 0.35). This may suggest the diversity in these OFC neurons is due to, for example, stimulus-selectivity, whereby some neurons are selective for stimuli coding the edges of the reward distribution, which could appear as optimism/pessimism in a given stimulus set, but does not generalise across stimulus set as would be expected from diversity related to value. The RPE-selective neurons had no consistent diversity, and as RPE selectivity is a requirement to test further predictions of distributional RL, we did not look for further distributional RL signatures in these brain regions.

+8

Distributional reinforcement learning in prefrontal cortex
  • Article
  • Full-text available

January 2024

·

99 Reads

·

8 Citations

Nature Neuroscience

Timothy H. Muller

·

James L. Butler

·

Sebastijan Veselic

·

[...]

·

Steven W. Kennerley

The prefrontal cortex is crucial for learning and decision-making. Classic reinforcement learning (RL) theories center on learning the expectation of potential rewarding outcomes and explain a wealth of neural data in the prefrontal cortex. Distributional RL, on the other hand, learns the full distribution of rewarding outcomes and better explains dopamine responses. In the present study, we show that distributional RL also better explains macaque anterior cingulate cortex neuronal responses, suggesting that it is a common mechanism for reward-guided learning.

Download

Figure 2: The value space is represented with a grid-like code at choice in vmPFC. a) Left panel: 167 A value space is organised along the reward magnitude and reward probability values used for choice. 168 Within this value space, the left and right options are embedded as "locations". A trajectory or 169 navigation angle can be computed between each pair of possible locations. Middle & right panel: 170 Navigation angles falling along the grid field of a hypothetical grid cell (aligned) are predicted to 171 elicit stronger oscillatory activity compared to angles that do not fall along the grid field (unaligned). 172 b) Significant hexadirectional (sixfold) modulation in vmFPC but not control symmetries across 173 recording sessions. Each session represents the average of several channels recorded within that 174 session. See also Figure S3A. Error bars represent SEM across sessions (n = 16). * pBonferroni < .05, 175 corrected for symmetries. c) Time course of hexadirectional (sixfold) modulation in vmPFC and other 176 brain regions. Blue shading denotes the original time window in Figure 2B and Figure 1E. Lines 177 above brain regions denote significant hexadirectional (sixfold) encoding at p < .05. See also Figure 178
Figure 4: vmPFC neurons are theta modulated and maintain a grid-like code and
Figure 5: Sharp wave ripples in the vmPFC. a) Four simultaneously recorded channels.
Figure 6: Accuracy and reward modulate ripple proportion during choice and rest. a)
A cognitive map for value-guided choice in ventromedial prefrontal cortex

December 2023

·

72 Reads

·

1 Citation

The prefrontal cortex is crucial for economic decision-making and representing the value of options. However, how such representations facilitate flexible decisions remains unknown. We reframe economic decision-making in prefrontal cortex in line with representations of structure within the medial temporal lobe because such cognitive map representations are known to facilitate flexible behaviour. Specifically, we framed choice between different options as a navigation process in value space. Here we show that choices in a 2D value space defined by reward magnitude and probability were represented with a grid-like code, analogous to that found in spatial navigation. The grid-like code was present in ventromedial prefrontal cortex (vmPFC) local field potential theta frequency and the result replicated in an independent dataset. Neurons in vmPFC similarly contained a grid-like code, in addition to encoding the linear value of the chosen option. Importantly, both signals were modulated by theta frequency — occurring at theta troughs but on separate theta cycles. Furthermore, we found sharp-wave ripples — a key neural signature of planning and flexible behaviour — in vmPFC, which were modulated by accuracy and reward. These results demonstrate that multiple cognitive map-like computations are deployed in vmPFC during economic decision-making, suggesting a new framework for the implementation of choice in prefrontal cortex.


Generative replay underlies compositional inference in the hippocampal-prefrontal circuit

October 2023

·

95 Reads

·

21 Citations

Cell

Human reasoning depends on reusing pieces of information by putting them together in new ways. However, very little is known about how compositional computation is implemented in the brain. Here, we ask participants to solve a series of problems that each require constructing a whole from a set of elements. With fMRI, we find that representations of novel constructed objects in the frontal cortex and hippocampus are relational and compositional. With MEG, we find that replay assembles elements into compounds, with each replay sequence constituting a hypothesis about a possible configuration of elements. The content of sequences evolves as participants solve each puzzle, progressing from predictable to uncertain elements and gradually converging on the correct configuration. Together, these results suggest a computational bridge between apparently distinct functions of hippocampal-prefrontal circuitry and a role for generative replay in compositional inference and hypothesis testing.




Covert valuation for information sampling and choice

October 2021

·

69 Reads

·

6 Citations

We use our eyes to assess the value of objects around us and carefully fixate options that we are about to choose. Neurons in the prefrontal cortex reliably encode the value of fixated options, which is essential for decision making. Yet as a decision unfolds, it remains unclear how prefrontal regions determine which option should be fixated next. Here we show that anterior cingulate cortex (ACC) encodes the value of options in the periphery to guide subsequent fixations during economic choice. In an economic decision-making task involving four simultaneously presented cues, we found rhesus macaques evaluated cues using their peripheral vision. This served two distinct purposes: subjects were more likely to fixate valuable peripheral cues, and more likely to choose valuable options whose cues were never even fixated. ACC, orbitofrontal cortex, dorsolateral prefrontal cortex, and ventromedial prefrontal cortex neurons all encoded cue value post-fixation. ACC was unique, however, in also encoding the value of cues before fixation and even cues that were never fixated. This pre-saccadic value encoding by ACC predicted which cue was next fixated during the decision process. ACC therefore conducts simultaneous processing of peripheral information to guide information sampling and choice during decision making.


Figure 1 -Diverse optimism in value coding across ACC neurons. A) On each trial, subjects chose between two cues of neighbouring probability value. Each probability value could be denoted by two stimuli, resulting in two stimulus sets (see 8 for task details). B) Example neuron responses demonstrating different levels of optimism. In each plot the mean firing rate is plotted as a function of time and split according to the chosen value (probability) level. There are four chosen values (0.3-0.9 probability) as subjects rarely chose the 0.1 probability level (choice accuracy was at ceiling; 98%). Insets demonstrate firing rate is a non-linear function of value. Mean firing rate (having z-scored across trials all trials) in a 200-600ms window post-cue onset is plotted as a function of the 4 values. Reversal points are the interpolated value at which there is 0 change from the mean firing rate, an index of non-linearity. C) Histogram showing a diversity of reversal points across neurons. D) Scatter plot showing reversal points estimated in one half of the data strongly predict that in the other half. Each point denotes a neuron. Inset: log p-values of the correlation between 1000 different random splits of the data into independent partitions. Black line denotes the geometric mean of these p-values. E) Scatter plot showing reversal points estimated in stimulus set 1 strongly predict those in stimulus set 2. Each point denotes a neuron. F) Anterior-posterior topographic location of the neuron predicts its reversal point, with more anterior ACC neurons more optimistic. Each point denotes a neuron.
Figure 2 -Diverse asymmetric scaling of reward prediction errors predicts choice optimism in ACC. A) An example neuron's responses at each of the task epochs: choice, feedback on rewarded trials, and feedback on unrewarded trials, demonstrating where in the task the betas are measured for RPE scaling asymmetries. í µí»½ ! and í µí»½ ! are betas corresponding to the scaling of positive and negative RPEs. B) Histogram showing a diversity of asymmetric scaling across ACC RPE neurons. C) Same format as Figure 1D but for asymmetric scaling consistency. Each point denotes a neuron; RPE-selective neurons only. D) Asymmetric scaling estimated at feedback predicts reversal point at choice. Each point denotes a neuron.
Distributional reinforcement learning in prefrontal cortex

June 2021

·

157 Reads

·

2 Citations

Prefrontal cortex is crucial for learning and decision-making. Classic reinforcement learning (RL) theories centre on learning the expectation of potential rewarding outcomes and explain a wealth of neural data in prefrontal cortex. Distributional RL, on the other hand, learns the full distribution of rewarding outcomes and better explains dopamine responses. Here we show distributional RL also better explains prefrontal cortical responses, suggesting it is a ubiquitous mechanism for reward-guided learning.


Figure 3. (A) 'Silhouette algebra' (cf., Eslami et al., 2018) to test for a conjunctive output representation in fMRI. We designed an analysis to test for generalisable representations of individual building blocks in specific relational positions by performing algebraic operations with neural representations for different silhouettes. For given building blocks WXYZ, the silhouette
Figure 5. Conjunctive representations akin to the 'silhouette algebra' from Figure 3B over time using RSA. (A) Left: We defined a theoretical similarity reflecting the overlap of building blocks in specific relations across silhouettes, and tested whether this similarity predicts empirical similarities of MEG sensor patterns across trials and time-points. Right: We found a significant conjunctive representation, that is representations of silhouettes according to their constituent building blocks in specific relations, during a confined time-window of 200-1000ms in the planning phase (significance assessed using a nonparametric permutation test, see methods for details). (B) We also found effects for shape (pixel) and size representational overlap during a similar time window in planning but with a slightly earlier onset. Note that the purple line in A and B are the same.
Generative replay for compositional visual understanding in the prefrontal-hippocampal circuit

June 2021

·

305 Reads

·

8 Citations

Understanding the visual world is a constructive process. Whilst a frontal-hippocampal circuit is known to be essential for this task, little is known about the associated neuronal computations. Visual understanding appears superficially distinct from other known functions of this circuit, such as spatial reasoning and model-based planning, but recent models suggest deeper computational similarities. Here, using fMRI, we show that representations of a simple visual scene in these brain regions are relational and compositional - key computational properties theorised to support rapid construction of hippocampal maps. Using MEG, we show that rapid sequences of representations, akin to replay in spatial navigation and planning problems, are also engaged in visual construction. Whilst these sequences have previously been proposed as mechanisms to plan possible futures or learn from the past, here they are used to understand the present. Replay sequences form constructive hypotheses about possible scene configurations. These hypotheses play out in an optimal order for relational inference, progressing from predictable to uncertain scene elements, gradually constraining possible configurations, and converging on the correct scene configuration. Together, these results suggest a computational bridge between apparently distinct functions of hippocampal-prefrontal circuitry, and a role for generative replay in constructive inference and hypothesis testing.


Figure 3. The relational structure of the task is represented in the entorhinal cortex Top: relational structure effect, peaking in EC. Bottom: stimulus identity effect, peaking in LOC. (A) Model RDMs. Black elements should be similar, white elements should be dissimilar. Pairs of stimuli with purple and orange rectangles around them are ÀCorr and +Corr, respectively. (B) Visualization of the data RDM from peak vertex of the effect, marked with an arrow in (D). (C) Visualization of the paired mean difference effects between same (black RDM elements in A) and different (white elements in A) pairs of conditions from the peak vertex of the effects. Both groups are plotted on the left axes as a slope-graph: each paired set of observations for one subject is connected by a line. The paired mean difference is plotted on a floating axis on the right, aligned to the mean of the same group. The mean difference is depicted by a dashed line (consequently aligned to the mean of the diff group). Error bars indicate the 95% confidence interval obtained by a bootstrap procedure. (D) Whole surface results, right hemisphere. Clusters surviving FWE correction across the whole surface at a cluster forming threshold of p < 0.001 are indicated in green. (E and F) Average data RDMs (left) across the entire (anatomically defined) right EC, and dendrograms constructed from them (right). (E) Same GLM as in (B-D). (GLM2); (F) A GLM where the two related stimuli in each block were collapsed onto a single regressor (GLM2a). The control stimuli were omitted from the data RDMs for visualization purposes but are included in the dendrograms (labeled ''0'').
Figure 4. Prediction error signals in vmPFC and ventral striatum depend on the current relational structure of the task (A) Visualization of whole-surface results of the multivariate prediction error 3 relational structure interaction effect, medial left hemisphere. (B) Interaction effect at the left hemisphere vmPFC peak of the univariate prediction error effect (MNI: [À4,44,À20]). (C) Interaction effect at the right hemisphere vmPFC peak of the univariate prediction error effect (MNI: [8,44,À11]). (D) Interaction effect at the ventral striatum peak univariate prediction error effect (MNI: [À10,8,À12]). Brain images in the insets of (B), (C), and (D) show the univariate prediction error effect (projected on the surface in B and C). Legend for (B), (C), and (D) is the same as in Figure 3C.
Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems

December 2020

·

200 Reads

·

78 Citations

Neuron

Knowledge of the structure of a problem, such as relationships between stimuli, enables rapid learning and flexible inference. Humans and other animals can abstract this structural knowledge and generalize it to solve new problems. For example, in spatial reasoning, shortest-path inferences are immediate in new environments. Spatial structural transfer is mediated by cells in entorhinal and (in humans) medial prefrontal cortices, which maintain their co-activation structure across different environments and behavioral states. Here, using fMRI, we show that entorhinal and ventromedial prefrontal cortex (vmPFC) representations perform a much broader role in generalizing the structure of problems. We introduce a task-remapping paradigm, where subjects solve multiple reinforcement learning (RL) problems differing in structural or sensory properties. We show that, as with space, entorhinal representations are preserved across different RL problems only if task structure is preserved. In vmPFC and ventral striatum, representations of prediction error also depend on task structure.


The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation

November 2020

·

1,064 Reads

·

405 Citations

Cell

The hippocampal-entorhinal system is important for spatial and relational memory tasks. We formally link these domains, provide a mechanistic understanding of the hippocampal role in generalization, and offer unifying principles underlying many entorhinal and hippocampal cell types. We propose medial entorhinal cells form a basis describing structural knowledge, and hippocampal cells link this basis with sensory representations. Adopting these principles, we introduce the Tolman-Eichenbaum machine (TEM). After learning, TEM entorhinal cells display diverse properties resembling apparently bespoke spatial responses, such as grid, band, border, and object-vector cells. TEM hippocampal cells include place and landmark cells that remap between environments. Crucially, TEM also aligns with empirically recorded representations in complex non-spatial tasks. TEM also generates predictions that hippocampal remapping is not random as previously believed; rather, structural knowledge is preserved across environments. We confirm this structural transfer over remapping in simultaneously recorded place and grid cells.


Citations (14)


... See also Supplemental Figure 1 and Supplemental Tables 2-4. A growing body of research suggests that information represented by the frontal cortices may not be linearly encoded in single neurons, but instead emergent from neural activity at the population level, with individual cells exhibiting mixed selectivity for internal and external variables (56)(57)(58)(59)(60)(61)(62)(63)(64). To determine if the representations of the task-related information differed in adolescent and adult dmPFC population activity, we trained a support vector machine to predict the identity of the 'go' or 'no-go' cues from the population calcium activity from early learning (sessions 1-5, before most animals had achieved expert status) or late learning (sessions 6+ in which animals achieved expert status) sessions. ...

Reference:

The adolescent frontal cortex shows stronger population-level encoding of information than the adult during a putative sensitive period
Distributional reinforcement learning in prefrontal cortex

Nature Neuroscience

... The grid cells code also generalises: it abstracts over the sensory particularities of spatial environments [5][6][7] . Recent research has argued grid cells also encode non-spatial and abstract 2D spaces [8][9][10][11][12] , raising the possibility that the same neural system might function as a coordinate system for multiple 2D domains. ...

A cognitive map for value-guided choice in ventromedial prefrontal cortex

... Humans excel at combining linguistic building blocks to infer the meanings of novel compositional words, such as "un-reject-able-ish". How do we accomplish this? While recent research on compositional generalization in relational memory, action planning and vision strongly implicates a medial prefrontal-hippocampal network (Baram et al., 2021; Barron et al., 2020;Schwartenbeck et al., 2023), it remains unclear whether the same network supports compositional inference in language. To this end, we trained participants on an artificial language in which the meanings of compositional words could be derived from known stems and unknown affixes, using abstract affixation rules (e.g., "good-kla" which means "bad", where "kla" reverses the meaning of the stem word "good"). ...

Generative replay underlies compositional inference in the hippocampal-prefrontal circuit
  • Citing Article
  • October 2023

Cell

... On the other hand, our model's ability to generate samples from learned sensory inputs is essential for offline replay. Cognitive functions that rely on offline replay include memory consolidation [73], planning of future actions [74], visual understanding [75], predictions [76], and decision-making [77]. Taken together, the neural activity of MCPC provides the basis upon which a wide array of other brain functions depend. ...

Generative Replay for Compositional Visual Understanding in the Prefrontal-Hippocampal Circuit
  • Citing Article
  • January 2022

SSRN Electronic Journal

... A similar view holds in perceptual decision-making, where it has been shown that several competing motor outputs can be prepared in parallel by partially segregated neural populations in premotor and prefrontal areas 36,37 . However, some brain areas seem to exclusively represent the options that are attended 13,[38][39][40] , with some work also showing that neurons in OFC and value-coding brain regions in general represent offer values in alternation, not in parallel 26 . Therefore, it is still unclear how many alternative offers can be simultaneously encoded, to what extent offer values are simultaneously computed and compared, and how these processes depend on gaze and attention patterns. ...

Covert valuation for information sampling and choice

... Temporal difference learning always converges to the true stimulus values; however, people often deviate from this linear learning trajectory in important ways. For example, there are asymmetric learning rates for rewarding vs. punishing stimuli (Muller et al., 2021), the value function to be learned can be non-linear and learning itself is often distributed (François-Lavet et al., 2018). Further, when the number of states (e.g., a spatial location, a physiological state of being, or a mental state of mind) is large the time it takes to learn the value function is infeasible for animals. ...

Distributional reinforcement learning in prefrontal cortex

... On the other hand, our model's ability to generate samples from learned sensory inputs is essential for offline replay. Cognitive functions that rely on offline replay include memory consolidation [69], planning of future actions [70], visual understanding [71], predictions [72], and decision-making [73]. Taken together, the neural activity of MCPC provides the basis upon which a wide array of other brain functions depend. ...

Generative replay for compositional visual understanding in the prefrontal-hippocampal circuit

... One particularly effective way of achieving this abstraction is by separating (or "factorizing") the representation of the relational structure of the environment from its specific sensory details (Behrens et al., 2018;Whittington et al., 2020). A potential locus for such an abstracted representation is the medial prefrontal cortex (mPFC), which has been implicated in recognising commonalities across experiences in the schema literature (Baldassano et al., 2018;El-Gaby et al., 2023;Gilboa and Marlatte, 2017;van Kesteren et al., 2013;Tse et al., 2007Tse et al., , 2011 and using these for inference (Zeithamova and Preston, 2010), as well as in concept learning (Kumaran et al., 2009;Mack et al., 2020), structural generalisation of reinforcement learning problems (Baram et al., 2021;Samborska et al., 2022) and generalisation of spatial tasks across different paths (Kaefer et al., 2020;Morrissey et al., 2017;Tang et al., 2023;Yu et al., 2018). ...

Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems

Neuron

... Progress in parsing out the processes that support flexible learning in humans will require further careful investigation of different cognitive mechanisms and their interactions (Collins & Cockburn, 2020). Such findings in the cognitive domain should inform AI research, leading to the development of innovative (non-RL) algorithms that support more flexible RL behavior (Whittington et al., 2020). ...

The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation

Cell

... Humans excel at combining linguistic building blocks to infer the meanings of novel compositional words, such as "un-reject-able-ish". How do we accomplish this? While recent research on compositional generalization in relational memory, action planning and vision strongly implicates a medial prefrontal-hippocampal network (Baram et al., 2021; Barron et al., 2020;Schwartenbeck et al., 2023), it remains unclear whether the same network supports compositional inference in language. To this end, we trained participants on an artificial language in which the meanings of compositional words could be derived from known stems and unknown affixes, using abstract affixation rules (e.g., "good-kla" which means "bad", where "kla" reverses the meaning of the stem word "good"). ...

Entorhinal and ventromedial prefrontal cortices abstract and generalise the structure of reinforcement learning problems
  • Citing Preprint
  • November 2019