Warren Woodrich Pettine’s research while affiliated with Yale-New Haven Hospital and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (6)


BUILDING RESOURCES FOR THE DIVERSIFICATION OF GENOMIC DATA ON SUICIDE MORTALITY
  • Article

October 2024

·

9 Reads

European Neuropsychopharmacology

·

·

Bichitra Nand Patra

·

[...]

·

Anna Docherty

Attention and learning strategies reveal distinct profiles of psychiatric traits

June 2024

·

5 Reads

Humans presented with the same problem in the same environment commonly adopt wildly different strategies for attention and learning. Indeed, psychiatric conditions are defined by qualitative differences in behavior. However, most tasks measure an individual's deviation from a single expected strategy rather than the utilization of distinct strategies. Measuring diverse strategies is especially important for psychiatry, were conditions are defined by qualitatively distinct patterns of behavior. We paired psychiatric trait questionnaires with a context generalization task whose metrics for goal-directed attention and short-term memory identify qualitatively distinct strategies. Questionnaires assessed for traits associated with ASD, attention-deficit/hyperactivity disorder (ADHD), obsessive compulsive disorder (OCD), depression, schizotypy and psychosis. The subject population recruited online was matched for self-reported sex, and the sample was enriched with those reporting a formal diagnosis of ASD. 744 subjects completed the first session of the task, and 584 returned after four to six weeks to complete the second session. We found that a strategy dominated by goal-directed attention was associated with a profile of reduced trait scores relative to other subjects across all measures. During the second session, this strategy was particularly pronounced for those with reduced ADHD traits. In contrast, a strategy of attending to features based on frequency was associated with a profile of increased trait scores relative to other subjects, particularly traits for ASD and OCD. During the second session, this strategy was again associated with elevated traits, particularly those for ASD and ADHD traits. These results provide insight into the relationship between psychiatric traits and qualitatively distinct attention and learning strategies.


Distance and discriminative models of latent-state learning produce distinct generalization failures
a, Prototype and exemplar models rely on a distance metric to determine state membership. This produces states whose weights on individual feature dimensions (shape of the cluster) do not depend on their informativeness for comparative decisions. A discriminative model (right) defines states on the basis of feature dimensions that are informative for separating one state from another. This produces state boundaries defined by those dimensions most informative for discrimination. b, Common grocery store items can be considered latent states of which one encounters only variable examples. Each item is associated with sets of actions. c, A distance model can fail to generalize when a highly regular, yet non-informative feature changes in novel examples. During initial learning, the packaging is highly regular across examples (left), and thus it is considered informative (middle). When this feature is altered in novel examples, a distance model struggles to generalize the rewarded action (right). d, A purely discriminative model can fail to generalize when previously non-informative features become informative in a novel context. During learning in the initial contexts, packaging was non-discriminative (left). Thus, the distance is zero between otherwise-identical alternatives learned in separate contexts (middle). This leads to generalization failure when these alternatives are brought together in a novel context (right).
Algorithmic models of latent-state instrumental reinforcement learning using prototype or exemplar with discriminative attention
a, Depending on feedback, novel examples either refine an existing state (that is, internal state representation) or create a new one. b, On a given trial, a ProDAtt or ExDAtt model agent encounters an example vector of features. (1) To recognize the latent context, the agent uses a Bayesian surprise threshold. Bayesian surprise is a well-described transformation of the posterior probability⁵⁹. If no state is less surprising than a threshold, the agent uses the vector to create a new state. If more than one state is below the threshold, the agent includes those states in the context. (2) The agent then calculates the mutual information of each feature. By comparing the entropy of a feature within each state to its entropy across states, the mutual information identifies feature dimensions that maximally discriminate between states in a context. Attention weights (from the mutual information) are modulated by the integrated reward history, such that shifts in reward statistics increase overall attention. (3) Attention weights are used to scale the feature values of each state. (4) The agent then uses attention-scaled features in state estimation, recalculating the surprise metric. If no state is below the second surprise threshold, the agent creates a new state. Otherwise, it selects the least surprising. Each state learns a set of values for the available actions in the task. (5) Once a state has been selected, that state’s values are used to choose an action. (6) The agent updates the state representation and the action value. It also tracks the reward history. c, In the ‘prototype states with discriminative attention’ (ProDAtt) model, states are defined by the mean and covariance of past examples, while in the ‘exemplar states with discriminative attention’ (ExDAtt) model, every past state examplar is used. d, The surprise threshold determines whether a state is included in the context. e, The attention feedback step uses mutual information to scale feature dimensions by their discriminative informativeness (left). Either the most proximal state is selected, or if the distance is greater than the threshold, the agent creates a new state (right, ProDAtt model example).
Experiment 1. Human participants use discriminative attention when generalizing to novel examples
a, A model using distance to define state membership can fail to generalize if a new observation differs in a highly regular, but non-informative feature (left). A model using discriminative boundaries easily generalizes in this scenario (right). b, Participants learn the latent rules defining action-reward associations. During the tutorial, participants are instructed on stochastic rewards, that a single action can garnish rewards for multiple states, and that a single state can be rewarded for multiple actions. On a given trial, participants encounter an ‘alien artefact’ activated with one of four actions. The main task is composed of two blocks. During block 1, they learn the latent states with an initial set of examples. The transition to the second block occurs without notice to the participant. During block 2, new examples are introduced that differ in a previously non-discriminative feature (left). c, Top: initially learned examples (present block 1 and 2). Bottom: the novel generalization examples (introduced in block 2). Action D is never rewarded. d, Top: without discriminative attention, novel examples are separated from the initially learned examples by their texture. Bottom: with attention, novel examples are projected into the same feature space as the initially learned examples. e, Left: during the generalization block, differences in performance on the first appearance of initially learned versus novel examples measure generalization. Middle: the difference in error rates over the course of a session for discriminatively identical paired examples also quantifies generalization. Right: when first encountering a novel example, the choice of action D for novel example indicates a weak prior over state membership. f, Left: the First-Generalization-Appearance metric reveals that most participants generalize to new examples, while a tail in the distribution indicates individual variation. Middle: Paired-Generalization difference shows similar results. Right: exploration errors are below chance for most participants. First gen, First Generalization. g, Learning curves during initial state formation are similar across participants with different First-Generalization-Appearance scores, but diverge during generalization. The shaded areas represent the s.e.m. and the dotted lines denote chance levels.
Source data
Experiment 1. Varying algorithmic model use of top-down discriminative attention recapitulates the spectrum of human behaviour
a, In the ProDAtt and ExDAtt models, increasing the attention distortion parameter reduces sensitivity to attention feedback. This, in turn, increases the impact of variance in non-discriminative features. b, Learning curves from the ProDAtt model (top) and ExDAtt model (bottom) reveal that increasing attention distortion has no effect on initial learning, but separates the trajectory of initially learned and novel examples during generalization. For visualization, only the mean values are shown. c, Distorting attention in the ProDAtt model causes the magnitude of the first-generalization appearance difference to increase (left), along with that of the paired-generalization difference (middle). The proportion of exploration errors, however, remain below chance (right). d, The ExDAtt model shows the same trends. Shown are the mean ± s.e.m. for n = 100 agents.
Experiment 2. Majority of participants use previously non-discriminative features to generalize in novel context. A subpopulation attends to one discriminative feature
a, A discriminative model can fail to generalize when a feature that is non-discriminative during learning becomes the key discriminative feature for later decisions. This produces a distinct ‘Discriminative Error’. b, The novel context generalization task version 1 (CG1), where participants learn distinct state-action contingencies sequentially in two explicitly signalled contexts before entering a third explicitly signalled generalization context where all latent states are active. c, A discriminative model that learns only shape (D1M) forms a single decision boundary (top left). Alternatively, a discriminative model could form a decision boundary involving colour and shape for context 1 states, and a separate decision boundary involving only shape for context 2 states (D2M, top right). A model attending to the prototype covariance results in distinct states (bottom left). If all features receive equal attention, they each occupy an independent location (bottom right). For visualization, the size dimension is not shown and stimuli are slightly offset. d, Human participants vary in the probability of initial generalization errors (top) and Discriminative Errors (bottom). Initial generalization errors quantify the probability that participants select the rewarded action the first time they encounter each example in the generalization context. e, Idealized attention models produce predictions for distributions of errors, visualized as confusion matrices. f, Participant subpopulations produce distinct confusion matrices. g, Fitting participants’ observed confusion matrices by the idealized attention models reveals distinct patterns in the coefficient weights. Shown are the mean ± 97% CI for all participants (n = 53), HDE participants (n = 8), LDE-HIE participants (n = 19) and LDE-LIE participants (n = 26). h, Both the model comparison (left) and partial correlation coefficients (right) reveal that participants above the Discriminative Error threshold are better fit by the D1M than the D2M. Model comparison shows the mean ± s.e.m. for Pareto smoothed importance sampling leave-one-out cross-validation. D1M, discriminative model with one feature; D2M, discriminative model with two features; I, intercept; P, prototype covariance; A, all-feature attention; All, all participants; HDE, high Discriminative Error; LDE, low Discriminative Error; HIE, high initial error; LIE, low initial error.
Source data

+3

Human generalization of internal representations through prototype learning with goal-directed attention
  • Article
  • Publisher preview available

March 2023

·

264 Reads

·

9 Citations

Nature Human Behaviour

The world is overabundant with feature-rich information obscuring the latent causes of experience. How do people approximate the complexities of the external world with simplified internal representations that generalize to novel examples or situations? Theories suggest that internal representations could be determined by decision boundaries that discriminate between alternatives, or by distance measurements against prototypes and individual exemplars. Each provide advantages and drawbacks for generalization. We therefore developed theoretical models that leverage both discriminative and distance components to form internal representations via action-reward feedback. We then developed three latent-state learning tasks to test how humans use goal-oriented discrimination attention and prototypes/exemplar representations. The majority of participants attended to both goal-relevant discriminative features and the covariance of features within a prototype. A minority of participants relied only on the discriminative feature. Behaviour of all participants could be captured by parameterizing a model combining prototype representations with goal-oriented discriminative attention.

View access options

Human latent-state generalization through prototype learning with discriminative attention

December 2021

·

10 Reads

People cannot access the latent causes giving rise to experience. How then do they approximate the high-dimensional feature space of the external world with lower-dimensional internal models that generalize to novel examples or contexts? Here, we developed and tested a theoretical framework that internally identifies states by feature regularity (i.e., prototype states) and selectively attends to features according to their informativeness for discriminating between likely states. To test theoretical predictions, we developed experimental tasks where human subjects first learn through reward-feedback internal models of latent states governing actions associated with multi-feature stimuli. We then analyzed subjects’ response patterns to novel examples and contexts. These combined theoretical and experimental results reveal that the human ability to generalize actions involves the formation of prototype states with flexible deployment of top-down attention to discriminative features. These cognitive strategies underlie the human ability to generalize learned latent states in high-dimensional environments.


Excitatory-inhibitory tone shapes decision strategies in a hierarchical neural network model of multi-attribute choice

March 2021

·

110 Reads

·

18 Citations

We are constantly faced with decisions between alternatives defined by multiple attributes, necessitating an evaluation and integration of different information sources. Time-varying signals in multiple brain areas are implicated in decision-making; but we lack a rigorous biophysical description of how basic circuit properties, such as excitatory-inhibitory (E/I) tone and cascading nonlinearities, shape attribute processing and choice behavior. Furthermore, how such properties govern choice performance under varying levels of environmental uncertainty is unknown. We investigated two-attribute, two-alternative decision-making in a dynamical, cascading nonlinear neural network with three layers: an input layer encoding choice alternative attribute values; an intermediate layer of modules processing separate attributes; and a final layer producing the decision. Depending on intermediate layer E/I tone, the network displays distinct regimes characterized by linear (I), convex (II) or concave (III) choice indifference curves. In regimes I and II, each option’s attribute information is additively integrated. In regime III, time-varying nonlinear operations amplify the separation between offer distributions by selectively attending to the attribute with the larger differences in input values. At low environmental uncertainty, a linear combination most consistently selects higher valued alternatives. However, at high environmental uncertainty, regime III is more likely than a linear operation to select alternatives with higher value. Furthermore, there are conditions where readout from the intermediate layer could be experimentally indistinguishable from the final layer. Finally, these principles are used to examine multi-attribute decisions in systems with reduced inhibitory tone, leading to predictions of different choice patterns and overall performance between those with restrictions on inhibitory tone and neurotypicals.


Figure 1: Network Schematics and Sample Trial. I: input; C choice layer synaptic gating variable; T : intermediate transform layer synaptic gating variable; red: choice A; blue: choice B; black: attribute 1; white: 2; arrows: excitation; circles: inhibition. Both networks include an input layer and a final choice layer. (A) The Linear Network consists of two layers, with attribute signals from the input layer directly transmitted to the final choice layer. (B) The Hierarchical Network includes an additional intermediate layer that performs a functional transformation on the attribute signals prior to their passing to the choice area. (C) A sample trial of the Hierarchical Network, with intermediate layer weights J+ = 0.34 nA and J− = −0.02. For each area, the X-axis indicates time, while the Y-axis indicates the population firing rate. The first vertical line indicates the onset of the offer value signal, and the second vertical line indicates termination of the offer value signal.
Figure 2: Choice Performance Under Varying Environmental Uncertainty. The proportion of trials where the network chose option A, P (A), as a function of the difference in offer inputs is shown, along with a fit sigmoid. Input IA,1 was varied as all other inputs were held fixed. On the x-axis, inputs were normalized ( ˆ I) by the maximum choice alternative offer value's linear sum. (A) Performance of the Linear Network when attribute values are certain, with ση I = 0.0 Hz (magenta), and when attribute values are uncertain, with ση I = 0.75 Hz (green). (B) The same is given for the Hierarchical Network, with a intermediate layer weights of J+ = 0.33 nA and J− = −0.03 nA.
Figure 3: Decision Regimes of Hierarchical Networks. I: input; ˆ I: input normalized by the reference choice alternative values' linear sum; T : transform layer synaptic gating variable; P (A): proportion of 1,000 trials where alternative A was chosen; J+: excitatory weight; J−: inhibitory weight; yellow: varied weights; forest green: regime II; tan: regime I; sky blue: regime III. The first subscript indicates choice A or B, and the second subscript x indicates a generic attribute. Weights were systematically varied identically in all intermediate layer areas. (A) A diagram of a generic intermediate layer area and its inputs. (B) Examples of regime I (linear), regime II (convex) and regime III (concave) decision regimes. (C) The space of E/I weights is partitioned according to the decision regime of their indifference curves.
Figure 4: Hierarchical Network E/I Tone Governs Reward Performance Under Varying Uncertainty. (A) Heatmaps of the average reward-per-trial for different configurations of intermediate layer E/I weights. The X-axis is the intermediate layer inhibitory weight, and the Y-axis is the excitatory weight. On the heatmap colorbar, the 0 value is the minimum possible reward-per-trial, while 1 is the maximum possible reward-per-trial. The dot on the colorbar indicates the performance of the Linear Network. Weight-configurations in decision regime I are outlined in white on the heatmap. Regime II weight-configurations are to the upper left of those areas, and regime III are to the right. The weight-configuration achieving the most reward is indicated by the black "X." The left heatmap of (A) shows performance in the absence of environmental uncertainty (ση I = 0 Hz), while the right heatmap shows performance in the presence of high uncertainty (ση I = 1.5 Hz). (B) The optimal E/I weights shift as uncertainty increases. The X-axis is environmental uncertainty (ση I ), the left Y-axis shows the optimal excitatory weight and the right Y-axis shows the optimal inhibitory weight. (C) The shift from regime I (tan) to regime III (sky blue, colors as in Figure 3C). The level of uncertainty is on the X-axis and the curvature of the fit to the indifference curve is on the Y-axis. (D) A measure of the change in reward-per-trial as a function of noise for a regime I (J+ = 0.34, J− = −0.01), regime II (J+ = 0.36, J− = 0.0) and regime III (J+ = 0.35, J− = −0.10) weight-configuration. The magnitude of the slope indicates the susceptibility to uncertainty. (E) The slopes of lines fit in (D) for all weight configurations. The slope for the Linear Network is again indicated by the black dot.
Figure S1: Static Decision Models. (A) The linear model, where attribute input values (I) are summed into a choice value (CA or CB). Choice is then determined with a max operation. (B) The sequential max model with a max operation at the attribute level and the choice level. The surviving values from the first max operation are summed to calculate C. Two examples of the sequential max model are shown, where the larger attribute input values are bolded. On the left, both attribute inputs associated with A are larger, and so these are used to calculate the CA. Since the B attribute inputs are smaller, they do not survive the first stage, and so CB = 0. On the right, an attribute associated with each choice alternative survives the first stage, and so both C values are nonzero.
Hierarchical Network Model Excitatory-Inhibitory Tone Shapes Alternative Strategies for Different Degrees of Uncertainty in Multi-Attribute Decisions

January 2020

·

95 Reads

·

2 Citations

We investigated two-attribute, two-alternative decision-making in a hierarchical neural network with three layers: an input layer encoding choice alternative attribute values; an intermediate layer of modules processing separate attributes; and a choice layer producing the decision. Depending on intermediate layer excitatory-inhibitory (E/I) tone, the network displays three distinct regimes characterized by linear (I), convex (II) or concave (III) choice indifference curves. In regimes I and II, each option's attribute information is additively integrated. To maximize reward at low environmental uncertainty, the system should operate in regime I. At high environmental uncertainty, reward maximization is achieved in regime III, with each attribute module selecting a favored alternative, and the ultimate decision based upon comparison between outputs of attribute processing modules. We then use these principles to examine multi-attribute decisions with autism-related deficits in E/I balance, leading to predictions of different choice patterns and overall performance between autism and neurotypicals.

Citations (3)


... The existence of spatial "cognitive maps" which efficiently organize knowledge is well-documented in the literature [45][46][47][48][49] . These maps guide attention and learning across different domains 45,46,50 . While two-dimensional representations are often emphasized in cognitive tasks, research suggests that cognitive representations are multi-dimensional and can be compressed or unfolded based on task demands 17,51 . ...

Reference:

Feature identification learning both shapes and is shaped by spatial object-similarity representations
Human generalization of internal representations through prototype learning with goal-directed attention

Nature Human Behaviour

... Under the notion that uncertainty requires exploration of a larger space of options, we argue that this is akin to a lower learning rate for an individual feature at the benefit of distributed learning across uncertain features. Non-selective gain increases, e.g., provided by global arousal, can favor such distributed learning 108 . We observe that pupil sensitivity to rising uncertainty is retained across the adult lifespan but dampens in older age. ...

Excitatory-inhibitory tone shapes decision strategies in a hierarchical neural network model of multi-attribute choice

... Robust inhibition and long inhibitory time constants should contribute to the extension of the time window for signal summation and thus extend local temporal receptive fields. In artificial hierarchical networks, environmental uncertainty can be dynamically captured by variations of the E/I tone (Pettine, Louie, Murray, & Wang, 2020). In humans, dynamical integration of environmental uncertainty is circumscribed to the MCC (Behrens et al., 2007). ...

Hierarchical Network Model Excitatory-Inhibitory Tone Shapes Alternative Strategies for Different Degrees of Uncertainty in Multi-Attribute Decisions