Ishita Dasgupta’s research while affiliated with Mountain View College and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (39)


Meta-learning: Data, architecture, and both
  • Article

September 2024

·

6 Reads

·

1 Citation

Behavioral and Brain Sciences

Marcel Binz

·

Ishita Dasgupta

·

Akshay Jagadish

·

[...]

·

We are encouraged by the many positive commentaries on our target article. In this response, we recapitulate some of the points raised and identify synergies between them. We have arranged our response based on the tension between data and architecture that arises in the meta-learning framework. We additionally provide a short discussion that touches upon connections to foundation models.


Language models, like humans, show content effects on reasoning tasks

July 2024

·

62 Reads

·

68 Citations

PNAS Nexus

reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks but exhibit many imperfections. However, human abstract reasoning is also imperfect. Human reasoning is affected by our real-world knowledge and beliefs, and shows notable “content effects”; humans reason more reliably when the semantic content of a problem supports the correct logical inferences. These content-entangled reasoning patterns are central to debates about the fundamental nature of human intelligence. Here, we investigate whether language models—whose prior expectations capture some aspects of human knowledge—similarly mix content into their answers to logic problems. We explored this question across three logical reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task. We evaluate state of the art LMs, as well as humans, and find that the LMs reflect many of the same qualitative human patterns on these tasks—like humans, models answer more accurately when the semantic content of a task supports the logical inferences. These parallels are reflected in accuracy patterns, and in some lower-level features like the relationship between LM confidence over possible answers and human response times. However, in some cases the humans and models behave differently—particularly on the Wason task, where humans perform much worse than large models, and exhibit a distinct error pattern. Our findings have implications for understanding possible contributors to these human cognitive effects, as well as the factors that influence language model performance.


Meta-Learned Models of Cognition

November 2023

·

35 Reads

·

23 Citations

Behavioral and Brain Sciences

Psychologists and neuroscientists extensively rely on computational models for studying and analyzing the human mind. Traditionally, such computational models have been hand-designed by expert researchers. Two prominent examples are cognitive architectures and Bayesian models of cognition. While the former requires the specification of a fixed set of computational structures and a definition of how these structures interact with each other, the latter necessitates the commitment to a particular prior and a likelihood function which – in combination with Bayes’ rule – determine the model's behavior. In recent years, a new framework has established itself as a promising tool for building models of human cognition: the framework of meta-learning. In contrast to the previously mentioned model classes, meta-learned models acquire their inductive biases from experience, i.e., by repeatedly interacting with an environment. However, a coherent research program around meta-learned models of cognition is still missing to this day. The purpose of this article is to synthesize previous work in this field and establish such a research program. We accomplish this by pointing out that meta-learning can be used to construct Bayes-optimal learning algorithms, allowing us to draw strong connections to the rational analysis of cognition. We then discuss several advantages of the meta-learning framework over traditional methods and reexamine prior work in the context of these new insights.



Tile-revealing task
(A) An agent sequentially reveals tiles to uncover a shape on a 2D grid. Each task is defined by the underlying board. (B) The underlying boards are sampled from a specific abstraction, which defines a distribution of boards based on an abstract rule. More examples of boards from each distribution can be found in S1 Fig.
Metamers for abstract tasks
(A) Generating metamer distributions. We train a neural network to predict randomly held-out tiles of samples from an abstract rule-based distribution. We then use this network to perform Gibbs sampling. This was done separately for each abstraction. (B) Samples from each abstraction’s corresponding metamer distributions. (C) First-, second-, and third-order statistics of boards within each abstract task distribution and their corresponding metamer distribution to show statistical similarity between the distributions. The first-order statistic is the number of red tiles minus the number of blue tiles. The second-order statistic is the number of matching (i.e. same color) neighbor pairs minus the number of non-matching pairs. The third-order statistic is the number of matching “triples” (i.e. a tile, its neighbor, and its neighbor’s neighbor) minus the number of non-matching triples. The statistics were not significantly different across each of the three levels between the abstract and metamer distributions with only two exceptions in the second- and third-order statistics of the pyramid distribution and symmetry distributions.
(A). Performance of human and neural network agents across all abstractions and their metamers in the tile revealing task. Each dot represents an individual human/agent’s mean across tasks in the corresponding task distribution and error bars represent 95% confidence intervals across humans or agents. Performance is the number of blue tiles revealed z-scored by a nearest neighbor heuristic, so a lower number reflects better performance. Inset plots each show the difference between abstract and metamer performance (higher is better for abstract). The difference between abstract and metamer tasks performance in humans is typically larger than that of the agent. (B) Examples of specific choices for a particular board (upper panels) made by humans (green squares in middle panels) versus agents (purple squares in lower panels) in a subset of the abstractions. In all cases, the agent chooses an action (with high confidence) that violates the rule whereas the majority of humans in each case choose the action consistent with the rule.
Results from two-sample independent t-tests that compare metamer vs abstract performance on other neural network architectures
Humans typically do better on the abstract task distributions (as evidenced by a positive t value) whereas the agents do typically worse (as evidenced by a negative t value). See S7 Fig for performance in human and different neural network architectures across all abstractions and their metamers. Humans typically have a higher difference than the agents, but it is not always the case.
Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning
  • Article
  • Full-text available

August 2023

·

38 Reads

·

5 Citations

The ability to acquire abstract knowledge is a hallmark of human intelligence and is believed by many to be one of the core differences between humans and neural network models. Agents can be endowed with an inductive bias towards abstraction through meta-learning, where they are trained on a distribution of tasks that share some abstract structure that can be learned and applied. However, because neural networks are hard to interpret, it can be difficult to tell whether agents have learned the underlying abstraction, or alternatively statistical patterns that are characteristic of that abstraction. In this work, we compare the performance of humans and agents in a meta-reinforcement learning paradigm in which tasks are generated from abstract rules. We define a novel methodology for building “task metamers” that closely match the statistics of the abstract tasks but use a different underlying generative process, and evaluate performance on both abstract and metamer tasks. We find that humans perform better at abstract tasks than metamer tasks whereas common neural network architectures typically perform worse on the abstract tasks than the matched metamers. This work provides a foundation for characterizing differences between humans and machine learning that can be used in future work towards developing machines with more human-like behavior.

Download

Passive learning of active causal strategies in agents and language models

May 2023

·

25 Reads

What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. We formally illustrate that learning a strategy of first experimenting, then seeking goals, can allow generalization from passive learning in principle. We then show empirically that agents trained via imitation on expert data can indeed generalize at test time to infer and use causal links which are never present in the training data; these agents can also generalize experimentation strategies to novel variable sets never observed in training. We then show that strategies for causal intervention and exploitation can be generalized from passive data even in a more complex environment with high-dimensional observations, with the support of natural language explanations. Explanations can even allow passive learners to generalize out-of-distribution from perfectly-confounded training data. Finally, we show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models.


(a) Task structure for both experiments. Six training blocks are sandwiched between two baseline and two test blocks. The baseline and test blocks contain sequences generated from the “illusory” transition matrix in (c). (b) Participants are instructed to press the corresponding key on the keyboard according to trial-by-trial displayed instructions. They are given feedback on their performance, including accuracy and reaction times before the subsequent trial. (c) A non-deterministic, “illusory” transition matrix of the four possible key-presses is used to generate sequences for the baseline and test blocks for both experiments. The generative transition matrix with the two high (from A to B, C to D) and two medium transition probabilities (from B to C, D to A) produces “illusory” chunks that can be perceived as frequently occurring. To control the effect of habitual presses from consecutive fingers, a random mapping from “A”, “B”, “C”, “D” to “D”, “F”, “J”, “K”, is generated independently for each participant. (d) The instructions for training blocks differed between the two experiments and corresponding groups. In Experiment 1, participants were divided into three groups who learned independent, size 2, and size 3 chunks from a predefined set of chunks with equal probability. In Experiment 2, the sequences in the training blocks were also generated from the “illusory” transition matrix. One group was instructed to act as accurately as possible and the other groups was instructed to act as fast as possible.
(a) Chunking mechanism of rational model. The model keeps track of marginal and transitional probabilities among every pair of pre-existing chunks, and combines chunk pairs that yield the greatest joint probability as the next candidate to be chunked together. At the start, the four different keys are initialized to be the primitive chunks. A loss function that trades off reaction times and accuracy is evaluated on the pre-existing set of chunks. If a chunk update reduces the loss function, then the two pre-existing chunks are combined together. A parameter w determines how much more the model weighs an decrease of reaction times compared to an increase in accuracy. (b) Example model simulations of learning sequences of Experiment 1. A, B, C, D, are randomly mapped to D, F, J, K for individual participants. Because the transition AB occurred frequently, the model proposes this transition as a possible chunk. (c) Model simulation for Experiment 1. Bars represent the probability of a particular chunk parsed in a simulation over the whole experiment. The bars for the independent group on chunk AB, the independent and the size 2 group on chunk BC, and the independent and size 2 group on chunk ABC contain the probability of 0 and are therefore not visible in the graph. Note that these bars can be arbitrarily increased by changing w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document} while the qualitative results remain the same. (d) Model simulation for Experiment 2. Top: Average chunk length of different simulations when increasing w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document} from 0 (optimizing only accuracy) to 1 (optimizing only speed). As w\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w$$\end{document} increases the average chunk length increases, indicating that the model learns longer chunks when asked to care more about acting fast. Bottom: Transition probabilities learned by model with w=0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w = 0$$\end{document} and w=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w = 1$$\end{document}, corresponding to the rational maximization of accuracy and speed. If the model tries to act as accurately as possible, then it recovers the true transition probabilities of the “illusory” transition matrix. If the model tries to act as fast as possible, then sets the medium and high transition probabilities to be 1, i.e. deterministic. All results are averaged across 120 independent simulations. Error bars represent the standard error of the mean.
Results of Experiment 1. (a) Manipulation check. The number of chunks AB and ABC learned by participants during the training blocks by group. Chunks were retrieved using a categorization of between- and within-chunk transitions by a mixture of Gaussians analysis of participants’ reaction times. (b) Chunky Boost of size 2 chunks AB and BC by group. A chunky boost is measured by the relative change of Cohen’s d between baseline and test blocks for the highly and medium probable transitions. (c) Chunkiness of size-3 chunks ABC. Chunkiness is measured by the relative change of Wasserstein distance between the baseline and test blocks of between-chunk reaction times of all possible size 3 chunks. (d) Regression coefficients of interaction effects between condition and size 2, size 3, and true transition probabilities on reaction times during the final test blocks. (e) Chunk increase from the baseline to the test blocks by group for chunk AB and chunk ABC. Chunk increase is measured by the number of returned chunks from the mixture of Gaussians analysis. (f) Chunk reuse probability by group. Chunk reuse probability was calculated based on whether or not part of an earlier chunk were used in a later chunk that occurred within the next 30 trials. For all plots, error bars indicate the standard error of the mean.
Results of Experiment 2. (a) Manipulation check. Average reaction times and average response accuracy during training blocks by group. (b) Chunky Boost of size 2 chunks as measured by change of Cohen’s d by group evaluated on baseline and test blocks. The size 2 chunks include AB, BC, CD, and DA. (c) Chunkiness measured by a relative change of Wasserstein distance of size 3 chunks including ABC, BCD, CDA, DAB between the baseline and the test blocks. (d) Coefficient of interaction effect between chunky and true transition probabilities on reaction times during the test blocks. (e) Chunk increase from the baseline to the test blocks by condition for size-2 (AB, CD, BC, DA) and for size-3 chunks (ABC, BCD, CDA, DAB). (f) Chunk reuse probability by group. For all plots, error bars indicate the standard error of the mean.
Chunking as a rational solution to the speed–accuracy trade-off in a serial reaction time task

May 2023

·

111 Reads

·

2 Citations

When exposed to perceptual and motor sequences, people are able to gradually identify patterns within and form a compact internal description of the sequence. One proposal of how sequences can be compressed is people’s ability to form chunks. We study people’s chunking behavior in a serial reaction time task. We relate chunk representation with sequence statistics and task demands, and propose a rational model of chunking that rearranges and concatenates its representation to jointly optimize for accuracy and speed. Our model predicts that participants should chunk more if chunks are indeed part of the generative model underlying a task and should, on average, learn longer chunks when optimizing for speed than optimizing for accuracy. We test these predictions in two experiments. In the first experiment, participants learn sequences with underlying chunks. In the second experiment, participants were instructed to act either as fast or as accurately as possible. The results of both experiments confirmed our model’s predictions. Taken together, these results shed new light on the benefits of chunking and pave the way for future studies on step-wise representation learning in structured domains.


Meta-Learned Models of Cognition

April 2023

·

159 Reads

Meta-learning is a framework for learning learning algorithms through repeated interactions with an environment as opposed to designing them by hand. In recent years, this framework has established itself as a promising tool for building models of human cognition. Yet, a coherent research program around meta-learned models of cognition is still missing. The purpose of this article is to synthesize previous work in this field and establish such a research program. We rely on three key pillars to accomplish this goal. We first point out that meta-learning can be used to construct Bayes-optimal learning algorithms. This result not only implies that any behavioral phenomenon that can be explained by a Bayesian model can also be explained by a meta-learned model but also allows us to draw strong connections to the rational analysis of cognition. We then discuss several advantages of the meta-learning framework over traditional Bayesian methods. In particular, we argue that meta-learning can be applied to situations where Bayesian inference is impossible and that it enables us to make rational models of cognition more realistic, either by incorporating limited computational resources or neuroscientific knowledge. Finally, we reexamine prior studies from psychology and neuroscience that have applied meta-learning and put them into the context of these new insights. In summary, our work highlights that meta-learning considerably extends the scope of rational analysis and thereby of cognitive theories more generally.


Figure 2: Results. A. Performance on secret property conditional and secret property search tasks with different Planners and baseline RL. B. Robustness of the Planners under an imperfect Reporter on the secret property search task. C. Improvement in performance as a Reporter is trained on the Visual conditional task. All error-bars are CIs across multiple episodes. 3.3 Robustness to irrelevant reports We saw in the search task from the previous section, that the 70B Planner is reasonably robust to mistakes from the Actor (e.g. Section A.1). In this section, we examine if it can also be robust to a noisy Reporter. We break the assumption that only task relevant actions in the environment are reported, and irrelevant actions in the environment, e.g. "I have moved left" / "I have moved up and right" etc. are reported 20% of the time.
Figure 3: Pure RL baselines. See main text for details.
Collaborating with language models for embodied reasoning

February 2023

·

57 Reads

·

2 Citations

Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the ability to to adapt to new tasks through in-context learning. However, LSLMs do not inherently have the ability to interrogate or intervene on the environment. In this work, we investigate how to combine these complementary abilities in a single system consisting of three parts: a Planner, an Actor, and a Reporter. The Planner is a pre-trained language model that can issue commands to a simple embodied agent (the Actor), while the Reporter communicates with the Planner to inform its next command. We present a set of tasks that require reasoning, test this system's ability to generalize zero-shot and investigate failure cases, and demonstrate how components of this system can be trained with reinforcement-learning to improve performance.


Distilling Internet-Scale Vision-Language Models into Embodied Agents

January 2023

·

66 Reads

Instruction-following agents must ground language into their observation and action spaces. Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. We combine ideas from model distillation and hindsight experience replay (HER), using a VLM to retroactively generate language describing the agent's behavior. Simple prompting allows us to control the supervision signal, teaching an agent to interact with novel objects based on their names (e.g., planes) or their features (e.g., colors) in a 3D rendered environment. Fewshot prompting lets us teach abstract category membership, including pre-existing categories (food vs toys) and ad-hoc ones (arbitrary preferences over objects). Our work outlines a new and effective way to use internet-scale VLMs, repurposing the generic language grounding acquired by such models to teach task-relevant groundings to embodied agents.


Citations (18)


... Previous research has shown that in some classic experiments in cognitive psychology, LLMs can exhibit similar behavior to humans [39] and demonstrate human-like content effects in logical reasoning tasks [40]. Therefore, it is plausible that LLMs can establish word associations similar to humans, enabling metaphor understanding.When people hear a metaphor, they are not only comprehending its meaning but also attempting to deduce additional information, such as the true nature of the entity being described. ...

Reference:

Towards Multimodal Metaphor Understanding: A Chinese Dataset and Model for Metaphor Mapping Identification
Language models, like humans, show content effects on reasoning tasks
  • Citing Article
  • July 2024

PNAS Nexus

... For example, in the case of medical decision-making requiring professional expertise and EMBODIMENT, VIRTUE DEVELOPMENT, AND EXEMPLARS 20 cognitive skills, AI systems demonstrated significantly better diagnosis accuracy when they learned patterns from concrete examples than individual rules and skill sets (Yamazaki et al., 2023). Recent research on meta-learning, which is about generalizing learning outcomes across different cognitive domains, suggests that input from diverse examples is essential for developing domain-general prediction and learning abilities (Binz et al., 2023; Han, Under review). Moreover, learning by example, not abstract rule, promotes the development of complicated cognitive capacities in AI (FeldmanHall & Lamba, 2023;Pedersen & Johansen, 2020). ...

Meta-Learned Models of Cognition
  • Citing Article
  • November 2023

Behavioral and Brain Sciences

... In-context learning allows large language models (LLMs) to adapt to tasks by leveraging context provided in input prompts [11]. This approach differs from conventional methods that require explicit model fine-tuning, as LLMs can instead use contextual information without modifying their underlying parameters [5,26,50]. Prompts, typically in textual form, provide scenarios or questions, guiding LLMs to generate relevant responses [58]. ...

Can language models learn from explanations in context?
  • Citing Conference Paper
  • January 2022

... However, there is preliminary evidence that LLMs do have something like a world model and causal understanding (Burns et al. 2022;Kıcıman et al. 2023;Li et al. 2023;Meng et al. 2023; though see: Thibodeau 2022). Finally, there are some initial successes in connecting LLMs to embodied agents (Dasgupta et al. 2023;Huang et al. 2022aHuang et al. , 2022b. While this brief survey is by no means the last word, I think the burden of proof is currently on the side of researchers arguing that LLMs won't scale to AGI to specify what precise capacities they lack and why those capacities are unlikely to emerge with increases in scale. ...

Collaborating with language models for embodied reasoning

... The advent of large language models (LLMs) has triggered intense interest in whether these new AI models are in fact approaching human-level abilities in language understanding (DiStefano et al., 2024;Köbis & Mossink, 2021;Mahowald et al., 2024;McClelland et al., 2020) and various forms of reasoning (Binz & Schulz, 2023;Chan et al., 2022;Dasgupta et al., 2022;Srivastava et al., 2022;Wei et al., 2022), including analogy (Webb et al., 2023). Given the enormous and non-curated text corpora on which LLMs have been trained, these models have certainly had ample opportunity to mine the metaphors that humans have already formed and planted in texts. ...

Language models show human-like content effects on reasoning
  • Citing Preprint
  • July 2022

... This was not the case: As the number of points in a stimulus increased, people held neither the number of clusters nor the number of points per cluster constant; rather, both quantities increased linearly. This is partly consistent with the rational account of clustering as minimizing representational complexity (Dasgupta & Griffiths, 2022). ...

Clustering and the efficient use of cognitive resources
  • Citing Article
  • August 2022

Journal of Mathematical Psychology

... Due to the difficulties of aligning LMs to a set of beliefs (Hendrycks et al., 2021;Arora et al., 2023), constraining them to predict in a fair manner (Nabi et al., 2022), or simply defining a fair model , is an exceedingly Figure 1: Visual representation of the steps composing FAIRBELIEF: a prompt is given to a LM, which provides a completion assessed by our framework. difficult task (Kumar et al., 2022). Along the same lines go fairness definition and evaluation. ...

Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines

... They have demonstrated promising results across a wide range of tasks, including tasks that require specialized scientific knowledge and reasoning 12,62 . Perhaps the most interesting aspect of these LLMs is their in-context few-shot abilities, which adapt these models to diverse tasks without gradient-based parameter updates 15,67,80,81 . This allows them to rapidly generalize to unseen tasks and even exhibit apparent reasoning abilities with appropriate prompting strategies 13,16,20,63 . ...

Can language models learn from explanations in context?
  • Citing Preprint
  • April 2022