Figure - available from: Nature Human Behaviour
This content is subject to copyright. Terms and conditions apply.
Algorithmic models of latent-state instrumental reinforcement learning using prototype or exemplar with discriminative attention
a, Depending on feedback, novel examples either refine an existing state (that is, internal state representation) or create a new one. b, On a given trial, a ProDAtt or ExDAtt model agent encounters an example vector of features. (1) To recognize the latent context, the agent uses a Bayesian surprise threshold. Bayesian surprise is a well-described transformation of the posterior probability⁵⁹. If no state is less surprising than a threshold, the agent uses the vector to create a new state. If more than one state is below the threshold, the agent includes those states in the context. (2) The agent then calculates the mutual information of each feature. By comparing the entropy of a feature within each state to its entropy across states, the mutual information identifies feature dimensions that maximally discriminate between states in a context. Attention weights (from the mutual information) are modulated by the integrated reward history, such that shifts in reward statistics increase overall attention. (3) Attention weights are used to scale the feature values of each state. (4) The agent then uses attention-scaled features in state estimation, recalculating the surprise metric. If no state is below the second surprise threshold, the agent creates a new state. Otherwise, it selects the least surprising. Each state learns a set of values for the available actions in the task. (5) Once a state has been selected, that state’s values are used to choose an action. (6) The agent updates the state representation and the action value. It also tracks the reward history. c, In the ‘prototype states with discriminative attention’ (ProDAtt) model, states are defined by the mean and covariance of past examples, while in the ‘exemplar states with discriminative attention’ (ExDAtt) model, every past state examplar is used. d, The surprise threshold determines whether a state is included in the context. e, The attention feedback step uses mutual information to scale feature dimensions by their discriminative informativeness (left). Either the most proximal state is selected, or if the distance is greater than the threshold, the agent creates a new state (right, ProDAtt model example).

Algorithmic models of latent-state instrumental reinforcement learning using prototype or exemplar with discriminative attention a, Depending on feedback, novel examples either refine an existing state (that is, internal state representation) or create a new one. b, On a given trial, a ProDAtt or ExDAtt model agent encounters an example vector of features. (1) To recognize the latent context, the agent uses a Bayesian surprise threshold. Bayesian surprise is a well-described transformation of the posterior probability⁵⁹. If no state is less surprising than a threshold, the agent uses the vector to create a new state. If more than one state is below the threshold, the agent includes those states in the context. (2) The agent then calculates the mutual information of each feature. By comparing the entropy of a feature within each state to its entropy across states, the mutual information identifies feature dimensions that maximally discriminate between states in a context. Attention weights (from the mutual information) are modulated by the integrated reward history, such that shifts in reward statistics increase overall attention. (3) Attention weights are used to scale the feature values of each state. (4) The agent then uses attention-scaled features in state estimation, recalculating the surprise metric. If no state is below the second surprise threshold, the agent creates a new state. Otherwise, it selects the least surprising. Each state learns a set of values for the available actions in the task. (5) Once a state has been selected, that state’s values are used to choose an action. (6) The agent updates the state representation and the action value. It also tracks the reward history. c, In the ‘prototype states with discriminative attention’ (ProDAtt) model, states are defined by the mean and covariance of past examples, while in the ‘exemplar states with discriminative attention’ (ExDAtt) model, every past state examplar is used. d, The surprise threshold determines whether a state is included in the context. e, The attention feedback step uses mutual information to scale feature dimensions by their discriminative informativeness (left). Either the most proximal state is selected, or if the distance is greater than the threshold, the agent creates a new state (right, ProDAtt model example).

Source publication
Article
Full-text available
The world is overabundant with feature-rich information obscuring the latent causes of experience. How do people approximate the complexities of the external world with simplified internal representations that generalize to novel examples or situations? Theories suggest that internal representations could be determined by decision boundaries that d...

Similar publications

Preprint
Full-text available
Passage retrieval aims to retrieve relevant passages from large collections of the open-domain corpus. Contextual Masked Auto-Encoding has been proven effective in representation bottleneck pre-training of a monolithic dual-encoder for passage retrieval. Siamese or fully separated dual-encoders are often adopted as basic retrieval architecture in t...

Citations

... Hybrid Concept Learning Using Bayesian Principles. Today, the most prolific theories of concept learning are hybrids that have a duality of both rule-and similarity-based interpretations (Pettine et al. 2023). One influential example is Bayesian concept learning (Figure 1d;Tenenbaum & Griffiths 2001), which uses a distribution over hypothesized category boundaries (rectangles in Figure 1d) to categorize novel stimuli (Sidebar 2.1). ...
Article
Full-text available
Generalization, defined as applying limited experiences to novel situations, represents a cornerstone of human intelligence. Our review traces the evolution and continuity of psychological theories of generalization, from its origins in concept learning (categorizing stimuli) and function learning (learning continuous input-output relationships) to domains such as reinforcement learning and latent structure learning. Historically, there have been fierce debates between approaches based on rule-based mechanisms, which rely on explicit hypotheses about environmental structure, and approaches based on similarity-based mechanisms, which leverage comparisons to prior instances. Each approach has unique advantages: Rules support rapid knowledge transfer, while similarity is computationally simple and flexible. Today, these debates have culminated in the development of hybrid models grounded in Bayesian principles, effectively marrying the precision of rules with the flexibility of similarity. The ongoing success of hybrid models not only bridges past dichotomies but also underscores the importance of integrating both rules and similarity for a comprehensive understanding of human generalization.
... Perception in neural circuits is fundamentally belief in a description of the world -we see trees and leaves, not green and brown pixels (Gibson 1977;Rao and Ballard 1999;Friston 2005;Adams et al. 2013;Sterzer et al. 2018). Perception entails a process of categorization and generalization through parallel distributed processing that depends on experience and expertise (Grossberg 1976;Hertz et al. 1991;McClelland and Rogers 2003;Pettine et al. 2023). Motivation and evaluation are computational memory processes of their own (Balleine and Dickinson 1991;Andermann and Lowell 2017;Sharpe et al. 2021). ...
Preprint
Full-text available
Current theories of decision making suggest that the neural circuits in mammalian brains (including humans) computationally combine representations of the past (memory), present (perception), and future (agentic goals) to take actions that achieve the needs of the agent. How information is represented within those neural circuits changes what computations are available to that system which changes how agents interact with their world to take those actions. We argue that the computational neuroscience of decision making provides a new microeconomic framework (neuroeconomics) that offers new opportunities to construct policies that interact with those decision-making systems to improve outcomes. After laying out the computational processes underlying decision making in mammalian brains, we present four applications of this logic with policy consequences: (1) contingency management as a treatment for addiction, (2) precommitment and the sensitivity to sunk costs, (3) media consequences for changes in housing prices after a disaster, and (4) how social interactions underlie the success (and failure) of microfinance institutions.
... Hybrid Concept Learning Using Bayesian Principles. Today, the most prolific theories of concept learning are considered hybrids and have a duality of both rule-and similaritybased interpretations (Pettine et al. 2023). One influential example is the Bayesian concept learning framework (Figure 1d; Tenenbaum & Griffiths 2001), which uses a distribution over hypothesized category boundaries (boxes in Figure 1d) to categorize novel stimuli (Sidebar 2.1). ...
Preprint
Full-text available
Generalization, defined as applying limited experiences to novel situations, represents a cornerstone of human intelligence. Our review traces the evolution and continuity of psychological theories of generalization, from origins in concept learning (categorizing stimuli) and function learning (learning continuous input-output relationships), to domains such as reinforcement learning and latent structure learning. Historically, there have been fierce debates between rule-based mechanisms, which rely on explicit hypotheses about environmental structure, and similarity-based mechanisms, which leverage comparisons to prior instances. Each approach has unique advantages: rules support rapid knowledge transfer, while similarity is computationally simple and flexible. Today, these debates have culminated in the development of hybrid models grounded in Bayesian principles, effectively marrying the precision of rules with the flexibility of similarity. The ongoing success of hybrid models not only bridges past dichotomies but also underscores the importance of integrating both rules and similarity for a comprehensive understanding of human generalization.
... While some studies have suggested the use of correlations in state spaces as mechanism [35,36], most research focusses on representation learning [9]. This process describes the discovery of an abstracted representation and emphasizes dimensionality reduction [9,37,38] or a rescaling of dimensions [39] of the state space. Neural results in this line of research have emphasized the role of the FPN [9,37,38,40], the SN [40][41][42] and the DMN [9,43] in the discovery and encoding of abstracted representations. ...
... In a Bayesian model by Soto et al. [31], consequential regions are parameterized by a mean and the width along each dimension. This shows a close resemblance to very recent developments in RL, where generalization is assumed to rely on the mean and covariance of observed examples in some cognitive space with a rescaling of dimensions according to their perceived relevance for the task [39]. Likewise, exploiting correlations in the reward probability of tasks [35,36] is conceptually similar to the model of Lee et al. [47], where generalization is based on the assumption that similar stimuli lead to similar outcomes. ...
... The specific priors are described with the models. To avoid divergent transitions, we used a non-centered parameterization of all hierarchical parameters (39). All models were run for 4 chains until those converged. ...
Preprint
Full-text available
Generalization, the transfer of knowledge to novel situations, has been studied in distinct disciplines that focus on different aspects. Here we propose a Bayesian model that assumes an exponential mapping from psychological space to outcome probabilities. This model is applicable to probabilistic reinforcement and integrates representation learning by tracking the relevance of stimulus dimensions. Since the belief state about this mapping is dependent on prior knowledge, we designed three experiments that emphasized this aspect. In all studies, we found behavior to be influenced by prior knowledge in a way that is consistent with the model. In line with the literature on representation learning, we found the representational geometry in the middle frontal gyrus to correspond to the behavioral preference for one over the other stimulus dimension and to be updated as predicted by the model. We interpret these findings as support for a common mechanism of generalization.
Article
The brain is always intrinsically active, using energy at high rates while cycling through global functional modes. Awake brain modes are tied to corresponding behavioural states. During goal-directed behaviour, the brain enters an action-mode of function. In the action-mode, arousal is heightened, attention is focused externally and action plans are created, converted to goal-directed movements and continuously updated on the basis of relevant feedback, such as pain. Here, we synthesize classical and recent human and animal evidence that the action-mode of the brain is created and maintained by an action-mode network (AMN), which we had previously identified and named the cingulo-opercular network on the basis of its anatomy. We discuss how rather than continuing to name this network anatomically, annotating it functionally as controlling the action-mode of the brain increases its distinctiveness from spatially adjacent networks and accounts for the large variety of the associated functions of an AMN, such as increasing arousal, processing of instructional cues, task general initiation transients, sustained goal maintenance, action planning, sympathetic drive for controlling physiology and internal organs (connectivity to adrenal medulla), and action-relevant bottom-up signals such as physical pain, errors and viscerosensation. In the functional mode continuum of the awake brain, the AMN-generated action-mode sits opposite the default-mode for self-referential, emotional and memory processing, with the default-mode network and AMN counterbalancing each other as yin and yang.
Preprint
Full-text available
Most existing attention prediction research focuses on salient instances like humans and objects. However, the more complex interaction-oriented attention, arising from the comprehension of interactions between instances by human observers, remains largely unexplored. This is equally crucial for advancing human-machine interaction and human-centered artificial intelligence. To bridge this gap, we first collect a novel gaze fixation dataset named IG, comprising 530,000 fixation points across 740 diverse interaction categories, capturing visual attention during human observers cognitive processes of interactions. Subsequently, we introduce the zero-shot interaction-oriented attention prediction task ZeroIA, which challenges models to predict visual cues for interactions not encountered during training. Thirdly, we present the Interactive Attention model IA, designed to emulate human observers cognitive processes to tackle the ZeroIA problem. Extensive experiments demonstrate that the proposed IA outperforms other state-of-the-art approaches in both ZeroIA and fully supervised settings. Lastly, we endeavor to apply interaction-oriented attention to the interaction recognition task itself. Further experimental results demonstrate the promising potential to enhance the performance and interpretability of existing state-of-the-art HOI models by incorporating real human attention data from IG and attention labels generated by IA.