Content uploaded by Ty Malloy
Author content
All content in this area was uploaded by Ty Malloy on Aug 21, 2023
Content may be subject to copyright.
Generative Environment-Representation
Instance-Based Learning: A Cognitive Model
Tyler Malloy,1Yinuo Du,12 Fei Fang,2Cleotilde Gonzalez1
1,2 Carnegie Mellon University
1Department of Social and Decision Sciences
2Software and Societal System Department
5000 Forbes Ave, Pittsburgh PA, USA
Abstract
Instance-Based Learning Theory (IBLT) suggests that
humans learn to engage in dynamic decision making
tasks through the accumulation of experiences, repre-
sented by the decision task features, the actions per-
formed, and the utility of decision outcomes. This the-
ory has been applied to the design of Instance-Based
Learning (IBL) models of human behavior in a variety
of contexts. One key feature of all IBL model appli-
cations is the method of accumulating instance-based
memory and performing recognition-based retrieval. In
simple tasks with few features, this knowledge repre-
sentation and retrieval could hypothetically be done
using all relevant information. However, these meth-
ods do not scale well to complex tasks when exhaustive
enumeration of features is unfeasible. This requires cog-
nitive modelers to design task-specic representations
of state features, as well as similarity metrics, which
can be time consuming and fail to generalize to re-
lated tasks. To address this issue, we leverage recent ad-
vancements in Articial Neural Networks, specically
generative models (GMs), to learn representations of
complex dynamic decision making tasks without rely-
ing on domain knowledge. We evaluate a range of GMs
in their usefulness in forming representations that can
be used by IBL models to predict human behavior in
a complex decision making task. This work connects
generative and cognitive models by using GMs to form
representations and determine similarity.
Introduction
Instance Based Learning Theory (IBLT) represents the
cognitive processes for human decision making based
on cognitive memory mechanisms (i.e, recognition, re-
call, decay, noise) relevant to dynamic decision mak-
ing tasks (Gonzalez, Lerch, and Lebiere 2003). IBLT
brings together the following characteristics: accumula-
tion of examples in memory through training and task
repetition, development of pattern recognition and se-
lective alternative search, similarity-based memory re-
trieval, gradual withdrawal of attention while increas-
ing memory retrieval, and transition from rule-based to
exemplar-based performance.
Copyright © 2023, Association for the Advancement of Ar-
ticial Intelligence (www.aaai.org). All rights reserved.
Although IBLT models have been applied to dynamic
tasks involving complex information, this has previ-
ously relied on the use of hand-crafted features of the
environment being represented in an IBL model and,
therefore, the features are unique to each environment.
Another issue of applications of IBL modeling is the re-
quirement of a static denition of similarity in the space
of environment states throughout modeling.
In contrast, Generative Models (GM) are trained
to learn from a data set the underlying distribution
that is causally responsible for generating those data
(Salakhutdinov 2015). In other words, in GMs, the at-
tributes are not hand-crafted, but are learned from the
data. GMs have been integrated with other learning
models to demonstrate impressive success in improving
learning speed (Higgins et al. 2017).
One useful application of such GMs is in unsuper-
vised and semi-supervised learning, where there data is
not categorized, or only a small fraction has relevant
categories (Kingma et al. 2014). The learning of rep-
resentations useful for behavioral goals is an important
area of research in modelling human utility-based learn-
ing (Radulescu, Shin, and Niv 2021). However, to date,
the integration of GMs with cognitive models is lacking.
In this work, we propose the integration of GMs and
IBLT, into a new proposed algorithm called Generative
Environment-Representation Instance-Based Learning
(GERIBL) (pronounced as “jur-bl”). This new algo-
rithm seeks to enable IBLT models to leverage pre-
trained models that form representations of environ-
ments for dynamic decision making. This is done by in-
tegrating IBLT with Generative Models (GMs) that are
trained to learn from a data set the underlying distri-
bution that is causally responsible for generating such
data (Salakhutdinov 2015).
GMs have previously been integrated with Reinforce-
ment Learning (RL) to predict human learning of the
utility of visual stimuli (Malloy, Klinger, and Sims 2022)
and fast generalization to novel tasks (Malloy et al.
2022). This integration of GMs and RL has demon-
strated the usefulness of pre-trained GMs in forming
representations of environments that can be used in cog-
nitive models of learning. We expect that, a similar ap-
proach can be taken by integrating GMs into IBLT, and
take advantage of the strong cognitive foundations of
IBLT into cognitive architectures (i.e., ACT-R (Thom-
son et al. 2015)).
GERIBL is used as a test bed for the potential in-
tegration of GMs with cognitive models by comparing
dierent GM approaches. The learning task of genera-
tive models is closely related to the human experience
of making decisions based on visual information. Hu-
mans can leverage their experience observing visual in-
formation outside of the context of decision making to
improve their speed of learning and high generalization
(i.e., transfer of learning). Part of the reason for this
is that humans observe visual information in an unsu-
pervised context and form representations of that infor-
mation that is useful for a variety of tasks. This is sim-
ilar to the unsupervised training of Deep GMs which
enables them to form useful representations of infor-
mation that are generalizable. GERIBL leverages these
useful features of GMs to integrate with the cognitive
mechanisms of IBLT.
GERIBL describes the general framework for inte-
grating environment representations learned by a gen-
erative model into an IBL model. We evaluated two
approaches for GMs, AutoEncoders (AEs) which form
representations of stimuli that are useful for reconstruc-
tion; and Generative Adversarial Networks (GANs),
which attempt to learn to discriminate between envi-
ronment stimuli not in the original data set while simul-
taneously learning to generate environment stimuli that
are similar to the underlying data. The results show the
advantages of the integration of GMs and IBLT.
Preliminaries: Instance-Based Learning
Theory
In IBLT, the memory of agents consists of instances
(s, a, x)dened by the state s, their action aand the
outcome x(Gonzalez, Lerch, and Lebiere 2003). All in-
stances are stored in memory as outcomes xand options
k= (s, a). This means that an IBL model requires the
storage of all instances in memory in the form of these
triplets.
At time tthere may be nk,t generated instances
(k, xi,k,t ). Calculating the expected utility of an action
requires an aggregation of all similar instances to de-
termine their memory activation and probability of re-
trieval.
Among a set of actions considered at each time step,
agents take the action with the expected maximum util-
ity. Expected utility is calculated through a “blending”
function according to:
Vk,t =
nk,t
X
i=1
pi,k,txi,k ,t (1)
Where nk,t are the instances in memory, xi,k,t are
the outcomes, and the probability of retrieval is pi,k,t is
calculated as:
pi,k,t =exp (Λi,k,t /τ)
Pnk,t
j=1 exp (Λj,k,t/τ)(2)
where τis a temperature parameter and the activa-
tion value Λi,k,t, which represents the ease of recall of
a specic instance in memory, calculated according to:
Λi,k,t =ln X
t0∈Ti,k,t
(t−t0)−d+αX
j
Simj(fk
j, f ki
j)
+σln 1−ξi,k,t
ξi,k,t
(3)
where dand σare decay and noise parameters, and
Ti,k,t ⊂ {0, ..., t −1}is the previous observations of
instance i. The similarity function Sim(f, f 0)calculates
the similarity of instances in memory with the current
instance (Nguyen, Phan, and Gonzalez 2022). Because
of the relationship between noise σand temperature τ
in IBLT, the temperature parameter τis typically set
to σ√2.
Pattern Recognition
One potential challenge with the use of IBL models in
practice, for real-world problems, is that states can be
signicantly complex. This motivates the formation of
hand-crafted representations of the state by cognitive
modelers. A cognitive modeler often represents the fea-
tures in the state of an instance by using the observable
attributes in the environment that are relevant to per-
form a task. This has the benet of more accurately
representing cognitive realities, compared to the alter-
native of storing complex visual information in memory
or using hand-crafted features. The model proposed in
this work seeks to determine whether storing represen-
tations of complex information learned from a GM can
still be useful for modeling cognition, or if the task-
relevant information is lost.
Although the hand-crafted features that cognitive
modelers dene might be practical, a disadvantage of
this approach is that they cannot be formed automati-
cally. The representations depend on the cognitive mod-
elers’ judgment of what is important for the task. There
are no general principles or guidelines to decide on the
features that are relevant for the state in a task. Al-
though cognitive modelers rely on what is “observable”
in the task, the selection of features may be arbitrary,
highly determined by the experience of the cognitive
modeler on the task. The model proposed in this work
seeks to address this requirement on cognitive modelers.
Similarity-Based Memory Retrieval
A key feature of IBLT is that the activation function de-
pends on the similarity Sim(fk, f ki)between the char-
acteristics of the environment and the attributes of the
stored instances. This means that recognition, judg-
ment, and choice depend on the method of determining
similarity (Gonzalez, Lerch, and Lebiere 2003). IBLT
also proposes that decision makers learn to focus their
attention on task-relevant features and, in turn, select
the limited information they attend to based on this
similarity (Gonzalez, Lerch, and Lebiere 2003). How-
ever, until now, there has been no principled method to
achieve this goal.
Although measuring similarity is highly relevant in
models designed in IBLT, relatively little work has been
done in IBLT to compare dierent approaches to mea-
suring similarity. The similarity function used is of-
ten linear similarity, but some times it is opted for
some non-linear similarity function in a trial-and-error
modeling process. In the next section, the proposed
model will attempt to address the challenge of automat-
ically producing instance attributes, and consistently
and meaningfully measuring similarity, through the in-
tegration of an IBL model with a generative model.
Preliminaries: Generative Models
Generative Models (GM) are a class of machine learn-
ing methods that attempt to learn from a data set
by assuming that a probability distribution generated
the data and attempting to learn the underlying dis-
tribution (Harshvardhan et al. 2020). In this research,
we propose a set of methods to integrate IBL models
with three major classes of generative models, Vari-
ational Autoencoders (VAEs), Generative Adversarial
Networks (GANs), and Visual Transformers (ViT) to
address the current limitations of IBL models described
above.
Figure 1 illustrates the proposed Generative
Environment-Representation Instance-Based Learning
(GERIBL) cognitive model. In this proposed model,
the environment representation can be generated from
the GM, producing an environment state that the IBL
model can use to make decisions from experience. Fur-
thermore, the gure illustrates how the execution of ac-
tions from an IBL model can inuence the environment
presented in the GM.
While other types of generative models exist, these
two were chosen because of their general applicability to
various input modalities (image, text, audio, etc.) and
their usefulness in applications of the learning setting
described later. The remainder of this section provides
background information on these two types of genera-
tive models, as well as insight into the usefulness of rep-
resentations learned by these approaches in IBL models.
AutoEncoders
Autoencoder (AE) models function by assuming that
there is a set of generative factors ζthat causally ex-
plain the data in a set x∈X. The goal of training these
models is to learn an encoding function p(z|x)and a
decoding function p(x|z)that reect these generative
factors. The result is a model that can approximate the
true environmental distribution p∗(x).
When used with image data, these models typically
use the general structure of Convolutional Neural Net-
Figure 1: GERIBL: Generative Environment Represen-
tation Instance Based Learning Model consisting of a
generative model producing environment stimuli repre-
sentations that are used by an instance-based learning
model to make decisions from experience.
works (CNNs) to learn low-dimensional representations
of visual information that can be used to form recon-
structions of unobserved visual stimuli, such as human
faces (Zhang 2018).
Variational Autoencoder: (VAE) models use a
deep neural network to learn an encoder function
qφ(z|x)that outputs constrained representations zof
visual stimuli x(Kingma and Welling 2014). These rep-
resentations dene a vector of means µzand variances
σzthat form a Normal distribution N(µz, σz). This dis-
tribution is sampled to form a vector zthat is trans-
lated through to the encoder layers pθ(x|z)to produce a
reconstruction. These VAE models are trained to min-
imize the dierence between input and reconstruction
by maximizing the objective function (Pu et al. 2016):
L(θ, φ;x, z =Eqφ(z|x)[log pθ(x|z)] (4)
This learning objective is guaranteed to learn a gen-
erative model that will approximate the true environ-
mental distribution p∗(x). However, there is no guaran-
tee of any meaningful connection between the learned
latent representation zand the true generative factors
ζ(Chen et al. 2016). This lack of connection could be
problematic for decision models based on these inter-
nal representations, potentially motivating the use of
alternative training (Aridor, da Silveira, and Woodford
2023).
β-Variational Autoencoder: models seek to con-
nect generative factors ζand latent representations z
by adjusting the training of traditional VAEs by intro-
ducing a βparameter that further controls the infor-
mation bottleneck (Burgess et al. 2018). This is done
by penalizing a metric of informational complexity of
the representations using the KL-divergence between
the decoder and latent distribution, using the training
function (Higgins et al. 2016):
L(θ, φ;x, z, β ) = Eqφ(z|x)[log pθ(x|z)]
−βDK Lqφ(z|x)||p(z)(5)
The βparameter allows for additional control over
the information bottleneck of the model by adding a
weight to the informational complexity of the latent
representations dening the multivariate Gaussian dis-
tribution. The result is that the entire model is trained
to balance the accuracy of reconstruction and the com-
plexity of latent representation in an adjustable fashion.
Image Transformers
Pre-trained transformer models have the advantage of
wide applicability on a variety of dierent tasks and do-
mains, particularly in the context of Natural Language
Processing (NLP) (Wolf et al. 2019). However, concerns
have been raised over the use and usability of massive
pre-trained transformer models, suggesting that their
output may be the results of spurious correlations and
stochasticity (Bender et al. 2021). Part of the testing
of the Transformer based GMs with GIRBL will be to
compare models pre-trained using the exact same stim-
uli with ones trained using similar stimuli.
Image-based transformers apply transformer-based
self-attention mechanisms to machine learning domains
with visual data (Parmar et al. 2018; Dosovitskiy et al.
2020). The two models used to test the GERIBL model
use transformers, and dier in their training methods
and the size and form of their representations of visual
information.
Vision Transformer VAE: Variational Autoen-
coders trained using transformer models are able to
learn constrained representations of images of vari-
able size that are still useful for reconstruction (He
et al. 2022). These models can be integrated into the
GERIBL cognitive model using the encodings learned
by a Visual Transformer Variational Autoencoder (ViT-
VAE) model.
The ViT-VAE model uses 4 attention heads, and 2
NN layers of 64 nodes for the multi-layer perceptron
layers. The loss function is based on the dierence be-
tween the input and reconstruction. The VAE encoding
representation is used by the GERIBL model as an en-
vironment state representation, and takes the form of a
vector of real numbers of size 100.
Attention: The second transformer based GM that
is compared using the GERIBL model uses learned val-
ues from the self-attention heads of the transformer net-
work when processing visual information, this model is
referred to as the Attention model.
The Attention model has the same general structure
as the ViT-VAE model with the main dierence being
that it is not trained to reconstruct lossy versions of in-
put stimuli. The second dierence is the form and size of
the representation that is used by the GERIBL model.
In the case of the Attention model, the values of the 4
self-attention heads are used as the representations for
the GERIBL model.
Generative Adversarial Networks
Generative Adversarial Network (GAN) models are
trained using generator and discriminator networks
(Salakhutdinov 2015). The goal of the generator is to
produce images that appear similar to those in the
training data set so that the discriminator network can-
not tell the dierence. The goal of the discriminator is to
determine if a given image was produced by the genera-
tor or is a genuine original data set member. These mod-
els are trained in tandem in an adversarial structure.
Two GAN based models are used for comparison with
integration with the proposed GERIBL model. Both
models have the same general structure and training,
diering only in the size of their internal representation
space and other network features.
GAN Model: is the rst GAN model uses represen-
tations of size 100 to complete the learning objectives
of the generator and discriminator networks. This is
considered to be an ‘unconstrained’ version of a GAN,
analogous to the VAE model which has a larger repre-
sentation size and information complexity compared to
the β-VAE model. The calculation of similarity of the
GAN model is determined by the
Constrained GAN: The second GAN-based model
is motivated by a similar motivation to the β-VAE
model, in using an information bottleneck to produce
constrained representations that are less information-
ally complex, allowing for faster generalization, while
still being useful for the IBL module. This is done by
reducing the size of the internal representation from
100 to 3, the same size as the latent representation of
the β-VAE model. Additionally, the generator and dis-
criminator network feature map is reduced from 64 to
8, additionally imposing a stricter information bottle-
neck. All other model structures and hyper-parameters
are kept the same.
GERIBL: Proposed Model
The proposed Generative Environment-Representation
Instance-Based Learning (GERIBL) model is the inte-
gration of IBLT (the Python implementation of IBLT
called PyIBL) and generative models. We compare a va-
riety of GMs, including VAEs and GANs, in their abil-
ity to form representations of visual information that
can be used in a cognitive architecture model of dy-
namic decision making. This change is made primarily
by replacing environment state swith the correspond-
ing GM internal representations p(z). The result is a
cognitive architecture that predicts human recognition,
Figure 2: Contextual bandit learning task stimuli used in Experiment 1 (Right Panel) (https://nivlab.princeton.edu/-
data) and in Experiment 2 (Left, Middle, and Right Panel). Left panel: The rst set of stimuli shown to participants
in Experiment 2. Middle panel: The second set of stimuli shown to participants in Experiment 2. Right panel: the
third and nal set of stimuli shown to participants in Experiment 2. This is also the stimulus used in Experiment 1,
to learn which of the 9 possible features (shape,color,texture) was associated with a higher reward.
judgement, choice, and execution based on constrained
representations of visual information.
Furthermore, the GERIBL model alters the IBLT ac-
tivation function (Eq. 3) by replacing the feature-based
similarity function Sim(f, f 0), where similarity is based
on the internal representation of the GM zand the sim-
ilarity metric of the GM SimGM as follows:
Λi,k,t =ln X
t0∈Ti,k,t
(t−t0)−d
+αX
j
(SimGM(p(zk|k), p(zkj|kj)))
+σln 1−ξi,k,t
ξi,k,t
(6)
where p(zs|s)is the GM internal representation of ob-
served state sand p(zsj|sj)are the GM internal repre-
sentations of each instance in memory sj. Importantly,
this altered activation function avoids the necessity of
storing the full original environment stimuli, instead al-
lowing for cognitive mechanisms to use low-dimensional
representations of environments.
The type of GM that is used in the GERIBL model
results in dierences based on how the internal repre-
sentations of each GM are formed and how those models
determine representation similarity. For example, the β-
VAE determines similarity based on the loss in Eq. 5,
according to the KL divergence between the two rep-
resentation distributions and their informational com-
plexities.
Model Representations
Another benet of using GM-acquired representations
as instances of IBL models is that they can be updated
as the IBL model learns the utility of choice options.
This can reect the tendency of decision makers to at-
tend to features that are more relevant for a task at
hand, which in turn changes how they represent in-
formation internally. Previous work has compared how
β-VAE model representations can change as utility is
learned in a bandit task involving images of human
faces (Malloy et al. 2022). This is integrated into the
proposed model by training the generative model with
feedback from the GIRBL model blending function Vk,t
which uses the activation function 6 according to:
L(υ, k) = υVk ,t −xk2(7)
Where Vk,t is the predicted utility of the IBL model
before choice selection, and xkis the true observed out-
come. This functionality of the proposed model allows
for the updating of representation of environments as
the relevance to utility of dierent features is learned.
This utility-based training of generative model repre-
sentations has demonstrated more human-like decision-
making, reproducing biases in utility selection (Aridor,
da Silveira, and Woodford 2023), and fast generaliza-
tion (Malloy et al. 2022).
Learning Tasks
Experiment 1: Visual Utility Learning
The rst learning task was originally described (Niv
et al. 2015) collected by the Princeton Niv Neuroscience
Lab and made publicly available on their lab website1.
The experiment study was approved by the Princeton
University Internal Review Board.
This task consisted of a contextual n-armed bandit
in which participants were shown 3 dierent choice op-
tions consisting of a shape (square, circle, triangle),
color (red, green, blue), and texture (hatched, dotted,
wavy), as shown in Figure 2 (Right panel). On each
trial, the color, shape, and texture of each option are
randomized, with one instance of each feature type oc-
curring across the stimuli options (i.e., there is always
1 green option, 1 square option, etc.).
Experiment trials were variable lengths of roughly 20-
25 stimuli decision trials in which the same 1 of the 9
possible features was associated with a higher probabil-
ity (75% vs. 25%) of observing a reward of 1 instead of a
reward of 0. Data from 22 participants were collected in
this task, each making a total of 500 choice selections.
1https://nivlab.princeton.edu/data
Experiment 2: Transfer of Learning
This second experiment was originally collected and de-
tailed in (Malloy et al. 2023) by the Dynamic Decision
Making lab at Carnegie Mellon University, and made
publicly available on OSF2. 60 participants were re-
cruited online through Amazon Mechanical Turk. The
experiment was pre-registered on OSF and approved by
the Carnegie Mellon University Internal Review Board.
For full methods see (Malloy et al. 2023).
This experiment sought to test human Transfer of
Learning (ToL), referring to the application of previ-
ously learned skills onto a new task. The learning task
in Experiment 2 involves ToL in which participants rst
learned the values associated with shapes alone, then
shapes and colors, and nally the same shape-color-
texture features described in (Niv et al. 2015). The re-
wards ranged from roughly 4-6 points, determined by
the features of the chosen option, with random noise
added to the reward points to make the learning task
more challenging.
The experiment episodes consisted of 14 trials of each
type in the order shown in Figure 2. During one set
of trials, one of the three feature options was associ-
ated with a higher reward (roughly 7 vs. 5). As the
experiment progressed, the previously high-valued fea-
ture continued to indicate that an option had a higher
value. For example, if a square is associated with a
higher expected utility initially, then red squares will
have a higher expected utility than red triangles for the
remainder of the experiment block. The same is true for
the higher utility color once the texture is introduced.
Model and Human Performance
This section compares the 6 previously mentioned GMs
in their ability to be integrated with the proposed
GERIBL model. These GMs are pre-trained with a sub-
set of the stimuli shown in Figure 2, either the 3 shape
stimuli, 9 shape-color stimuli, or 27 shape-color-texture
stimuli. After this pre-training, the models are used to
produce a representation that the IBL module of the
GERIBL model takes in as an environment state. We
use the two learning experiments to compare human
participant performance, the 6 proposed GM instantia-
tions of GERIBL, and a handcrafted version of the IBL
model.
Visual Utility Learning
In the rst experiment on visual utility learning, GMs
are pre-trained using only the shape-color-texture stim-
uli set of 27 images.The results in 3 compare the three
types of GMs (VAE, Transformer, and GAN) with
human performance and an IBL model using hand-
crafted features. These results demonstrate that all
GMs roughly emulate human-like performance, with
the worst performing GMs being the GAN and ViT-
VAE model.
2https://osf.io/mt4ws/
Figure 3: Model and participant average probability of
selecting the correct option in the contextual bandit
task by within episode, chance rate is at 1/3.
In Figure 3, the blue models correspond to the GMs
with smaller representation sizes than the orange mod-
els which correspond to the GMs with larger represen-
tation sizes. As shown, the GMs with smaller represen-
tations are a better t to human behavior compared to
those with larger representations. This is likely due to
the fact that smaller representations are less informa-
tionally complex and thus are easier to quickly gener-
alize. These results indicate that one important factor
of GMs when integrating them in the GERIBL model
is the informational complexity of representations.
However, when using simple representations it is im-
portant to retain enough information for behavioral
goals. If the GMs representations were too simple, they
could remove information relevant to the task, mak-
ing it dicult for the IBL module to learn. This would
be a detriment to applying the GERIBL model, since
the main benet is the possibility of automatically gen-
erating environment features, as well as a metric for
comparing them.
Transfer of Learning
Transfer of learning is related to the goals of applying
GMs onto cognitive modeling in the potential applica-
tion of pre-trained models onto novel environments. To
compare the ability of GM representations to be ap-
plied onto new tasks, we limit the training data-sets in
Figure 4: GERIBL model average reward in the second experiment separated by the experiment condition. Generative
Models were trained only on a subset of the stimuli space indicated by color shade. Purple lines represent IBL model
performance using hand-crafted features. Green lines represent participant performance.
Experiment 2 by including only the shape images, only
the shape-color images, and nally only the shape-color-
texture images (see Figure 2). This produces 3 sets of
GMs for each image type, that are used to produce
representations of the visual information used to make
decisions in the other two types of tasks.
The rst noticeable aspect of these results is that
the majority of GMs had a higher transfer of learning
compared to the IBL model with hand-crafted features.
This can be observed by the asymptotic reward (mea-
sured by the average reward on the nal 5 trials) of each
GM trained on a subset of stimuli and tested in each
of the experiment task conditions. Of these GMs, the
best performing is the Transformer model using Atten-
tion values as its representation, which matches human
performance regardless of the stimuli it was trained on.
This indicates that this model has learned an ecient
representation of the stimuli applicable to related tasks.
In addition to testing GMs in their ability to be
applied onto a novel experiment task, these results
strengthen the two other motivations of GIRBL, in au-
tomatically determining relevant stimuli features and a
metric of similarity. If GMs required a unique training
approach for each stimuli space limited to that task,
then the applicability of pre-trained models would be
signicantly diminished. We show that GMs with small
representation spaces can be applied onto producing
human-like learning patterns even with novel stimuli.
Conclusions
The GERIBL model uses GMs to improve an IBL model
in three areas. Firstly, it uses representations of task
environments that are generated automatically, with-
out requiring cognitive modellers to develop a feature
set for each new task. Secondly, it allows for a metric
of similarity dened by the GM training, instead of by
cognitive modellers. Thirdly, it allows for improved pre-
diction of human behavior in transfer of learning tasks,
also demonstrating the ability of pre-trained GMs to be
applied onto novel tasks.
Of the GMs tested using the GERIBL model, the
β-VAE based model has the closest connection to bio-
logical visual processing, which has been related to the
disentanglement objective (Higgins et al. 2021). How-
ever, performing a complete analysis and comparison of
dierent types of GMs provides support of our proposed
model as a general framework for integrating GMs into
cognitive architectures that replicate human learning.
In addition to these main benets, the results shown
here point towards future research investigating the im-
pact of utility on the representations learned by GMs.
This could be one area where GMs dier highly in their
connection to human cognition, as they would likely
react dierently to training that incorporated utility
prediction. Previous work has compared GM represen-
tations as utility is learned in simulated settings (Mal-
loy, Klinger, and Sims 2022), but not yet compared to
behavior from human participants
Acknowledgements
This research was sponsored by the Army Research
Oce and accomplished under Australia-US MURI
Grant Number W911NF-20-S-000 and by the Army Re-
search Laboratory under Cooperative Agreement Num-
ber W911NF-13-2-0045 (ARL Cyber Security CRA)
References
Aridor, G.; da Silveira, R. A.; and Woodford, M.
2023. Information-Constrained Coordination of Eco-
nomic Behavior.
Bender, E. M.; Gebru, T.; McMillan-Major, A.; and
Shmitchell, S. 2021. On the dangers of stochastic par-
rots: Can language models be too big? . In Proceed-
ings of the 2021 ACM conference on fairness, account-
ability, and transparency, 610–623.
Burgess, C. P.; Higgins, I.; Pal, A.; Matthey, L.; Wat-
ters, N.; Desjardins, G.; and Lerchner, A. 2018. Un-
derstanding disentangling in β-VAE. arXiv preprint
arXiv:1804.03599.
Chen, X.; Kingma, D. P.; Salimans, T.; Duan, Y.;
Dhariwal, P.; Schulman, J.; Sutskever, I.; and Abbeel,
P. 2016. Variational Lossy Autoencoder. In Interna-
tional Conference on Learning Representations.
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn,
D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer,
M.; Heigold, G.; Gelly, S.; et al. 2020. An Image is
Worth 16x16 Words: Transformers for Image Recogni-
tion at Scale. In International Conference on Learning
Representations.
Gonzalez, C.; Lerch, J. F.; and Lebiere, C. 2003.
Instance-based learning in dynamic decision making.
Cognitive Science, 27(4): 591–635.
Harshvardhan, G.; Gourisaria, M. K.; Pandey, M.; and
Rautaray, S. S. 2020. A comprehensive survey and anal-
ysis of generative models in machine learning. Com-
puter Science Review, 38: 100285.
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; and Gir-
shick, R. 2022. Masked autoencoders are scalable vi-
sion learners. In Proceedings of the IEEE/CVF con-
ference on computer vision and pattern recognition,
16000–16009.
Higgins, I.; Chang, L.; Langston, V.; Hassabis, D.; Sum-
mereld, C.; Tsao, D.; and Botvinick, M. 2021. Unsu-
pervised deep learning identies semantic disentangle-
ment in single inferotemporal face patch neurons. Na-
ture communications, 12(1): 6456.
Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot,
X.; Botvinick, M.; Mohamed, S.; and Lerchner, A. 2016.
beta-vae: Learning basic visual concepts with a con-
strained variational framework. In International con-
ference on learning representations.
Higgins, I.; Pal, A.; Rusu, A.; Matthey, L.; Burgess,
C.; Pritzel, A.; Botvinick, M.; Blundell, C.; and Ler-
chner, A. 2017. Darla: Improving zero-shot transfer in
reinforcement learning. In International Conference on
Machine Learning, 1480–1490. PMLR.
Kingma, D. P.; Mohamed, S.; Jimenez Rezende, D.; and
Welling, M. 2014. Semi-supervised learning with deep
generative models. Advances in neural information pro-
cessing systems, 27.
Kingma, D. P.; and Welling, M. 2014. Auto-Encoding
Variational Bayes. stat, 1050: 1.
Malloy, T.; Du, Y.; Fang, F.; and Gonzalez, C. 2023.
Accounting for Transfer of Learning using Human Be-
havior Models. Human Computation and Crowdsourc-
ing.
Malloy, T.; Klinger, T.; and Sims, C. R. 2022. Modeling
human reinforcement learning with disentangled visual
representations. Reinforcement Learning and Decision
Making (RLDM).
Malloy, T. J.; Sims, C. R.; Klinger, T.; Riemer, M. D.;
Liu, M.; and Tesauro, G. 2022. Learning in Factored
Domains with Information-Constrained Visual Repre-
sentations. In NeurIPS 2022 Workshop on Information-
Theoretic Principles in Cognitive Systems.
Nguyen, T. N.; Phan, D. N.; and Gonzalez, C. 2022.
SpeedyIBL: A comprehensive, precise, and fast imple-
mentation of instance-based learning theory. Behavior
Research Methods, 1–24.
Niv, Y.; Daniel, R.; Geana, A.; Gershman, S. J.; Leong,
Y. C.; Radulescu, A.; and Wilson, R. C. 2015. Rein-
forcement learning in multidimensional environments
relies on attention mechanisms. Journal of Neuro-
science, 35(21): 8145–8157.
Parmar, N.; Vaswani, A.; Uszkoreit, J.; Kaiser, L.;
Shazeer, N.; Ku, A.; and Tran, D. 2018. Image trans-
former. In International conference on machine learn-
ing, 4055–4064. PMLR.
Pu, Y.; Gan, Z.; Henao, R.; Yuan, X.; Li, C.; Stevens,
A.; and Carin, L. 2016. Variational autoencoder for
deep learning of images, labels and captions. Advances
in neural information processing systems, 29.
Radulescu, A.; Shin, Y. S.; and Niv, Y. 2021. Hu-
man representation learning. Annual Review of Neu-
roscience, 44: 253–273.
Salakhutdinov, R. 2015. Learning deep generative mod-
els. Annual Review of Statistics and Its Application, 2:
361–385.
Thomson, R.; Lebiere, C.; Anderson, J. R.; and
Staszewski, J. 2015. A general instance-based learning
framework for studying intuitive decision-making in a
cognitive architecture. Journal of Applied Research in
Memory and Cognition, 4(3): 180–190.
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; De-
langue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.;
Funtowicz, M.; et al. 2019. transformers: State-
of-the-art natural language processing. arXiv preprint
arXiv:1910.03771.
Zhang, Y. 2018. A better autoencoder for image: Con-
volutional autoencoder. In International Conference on
Neural Information Processing ICONIP17-DCEC.