Content uploaded by Esther MondragΓ³n
Author content
All content in this area was uploaded by Esther MondragΓ³n on Oct 18, 2023
Content may be subject to copyright.
Algebras of actions in an agentβs representations of the world
Alexander Deana,β, Eduardo Alonsoa, Esther MondragΒ΄ona
aArtiο¬cial Intelligence Research Centre (CitAI), Department of Computer Science,
City, University of London, Northampton Square, EC1V 0HB, London, UK
Abstract
In this paper, we propose a framework to extract the algebra of the transformations of
worlds from the perspective of an agent. As a starting point, we use our framework
to reproduce the symmetry-based representations from the symmetry-based disentan-
gled representation learning (SBDRL) formalism proposed by [1]; only the algebra of
transformations of worlds that form groups can be described using symmetry-based rep-
resentations.
We then study the algebras of the transformations of worlds with features that occur
in simple reinforcement learning scenarios. Using computational methods, that we de-
veloped, we extract the algebras of the transformations of these worlds and classify them
according to their properties.
Finally, we generalise two important results of SBDRL - the equivariance condition
and the disentangling deο¬nition - from only working with symmetry-based representa-
tions to working with representations capturing the transformation properties of worlds
with transformations for any algebra. Finally, we combine our generalised equivari-
ance condition and our generalised disentangling deο¬nition to show that disentangled
sub-algebras can each have their own individual equivariance conditions, which can be
treated independently.
Keywords: Representation learning, Agents, Disentanglement, Symmetries, Algebra.
1. Introduction
Artiο¬cial intelligence (AI) has progressed signiο¬cantly in recent years due to massive
increases in available computational power allowing the development and training of new
deep learning algorithms [2, 3]. However, the best-performing learning algorithms often
suο¬er from poor data eο¬ciency and lack the levels of robustness and generalisation that
are characteristic of nature-based intelligence. The brain appears to solve complex tasks
by learning eο¬cient, low-dimensional representations through the simpliο¬cation of the
tasks by focusing on the aspects of the environment that are important to the performance
of each task [4, 5, 6, 7, 8, 9]. Furthermore, there is evidence that representations in nature
βCorresponding author.
Email addresses: alexander.dean@city.ac.uk (Alexander Dean), e.alonso@city.ac.uk (Eduardo
Alonso), e.mondragon@city.ac.uk (Esther MondragΒ΄on)
Preprint submitted to Artiο¬cial Intelligence October 4, 2023
arXiv:2310.01536v1 [cs.AI] 2 Oct 2023
and in artiο¬cial intelligence networks support generalisation and robustness [10, 11, 12,
13, 14, 15, 16].
In its most general form, a representation is an encoding of data. The encoded form
of the data is usually more useful than the original data in some way. For example,
the representation of the data could be easier to transfer than the original data, it
could improve the eο¬ciency or performance of an algorithm or biological system that
uses the representation [17, 18], or the representation could be a recreation of a partially
observable data distribution through the combination of many samples of the distribution
[19, 20].
In machine learning the use of representations has been shown to improve generalisa-
tion capabilities and task performance across a range of areas such as computer vision,
natural language processing, and reinforcement learning [18, 21, 22, 23]. Representation
learning algorithms can be thought of as seeking to remove the need for human-curated,
task-speciο¬c feature engineering by performing useful feature engineering automatically.
Representations even emerge naturally in neural networks trained to perform multiple
tasks [24]. Representation learning is the process of devising a method to transform the
input data into its encoded form.
An agent can interact with its world. In this work, we consider an artiο¬cial agent
interacting with a world to learn a representation of that world. In particular, we are
interested in the representation that the agent should have ideally learned by the end
of the representation learning process. There are some properties of the world that can
only be learned in the agentβs representation if the agent interacts with its environment
[25, 26]. Agents have an internal state (the agentβs representation), which can be viewed
as the agentβs understanding of the world, and an external state, which is made up of
everything in the world, not in the agentβs internal state. The agent accesses information
about its external state through sensory states (or sensors) and aο¬ects the world through
action states (or actions ). This boundary between the external and internal states is
known as a Markov blanket [27, 28]. The world can be treated as a data distribution,
which the agent samples using its sensors and interacts with using its action states. The
agent continually updates its internal state using information gained from its sensors as
it learns more about the world (representation learning). The agent is attempting to
create a representation of this data distribution by applying various operations to the
distribution and observing the outcome of these operations. The operations that the
agent applies to the distribution are the actions of the agent, and each sample that the
agent takes of the distribution using its sensors is an observation of the world. For our
purposes, it is important to note that the agent does not have to be embodied within the
world it is interacting with, it only needs to be able to sample observations of the world
using its sensors and to be able to interact with the world using its actions.
The area of representation learning that deals with representations of worlds that
are inο¬uenced by the actions of agents is called state representation learning. State
representation learning is a particular case of representation learning where features are
low dimensional, evolve through time, and are inο¬uenced by the actions of agents [17].
More formally, an agent uses sensors to form observations of the states of the world
(the observation process); these observations can then be used to create representations
of the world (the inference process). State representation learning has been combined
with reinforcement learning to enable algorithms to learn more eο¬ciently and with an
improved ability to transfer learnings to other tasks (improved generalisation) [29]. It
2
has been hypothesised that the success of state representation learning is because the
learned abstract representations describe general priors about the world in a way that is
easier to interpret and utilise while not being task speciο¬c; this makes the representations
useful for solving a range of tasks [18].
Reinforcement learning is a decision-making algorithm that involves an agent inter-
acting with its environment in order to maximise the total amount of some reward signal
it receives [30, 31, 32, 33]. We choose to explore the structure of worlds containing
features found in common reinforcement learning scenarios because representations of a
world from an agent interacting with that world, as found in our work, are often given
to reinforcement learning algorithms to improve the performance of these reinforcement
learning algorithms.
The question of what makes a βgoodβ representation is vital to representation learn-
ing. [1] argues that the symmetries of a world are important structures that should
be present in the representation of that world. The study of symmetries moves us away
from studying objects directly to studying the transformations of those objects and using
the information about these transformations of objects to discover properties about the
objects themselves [34]. The ο¬eld of Physics experienced such a paradigm shift due to
the work of Emmy Noether, who proved that conservation laws correspond to continuous
symmetry transformations [35], and the ο¬eld of AI is undergoing a similar shift.
The exploitation of symmetries has led to many successful deep-learning architec-
tures. Examples include convolutional layers [36], which utilise transitional symmetries
to outperform humans in image recognition tasks [37], and graph neural networks [38],
which utilise the group of permutations. Not only can symmetries provide a useful indica-
tor of what an agent has learned, but incorporating symmetries into learning algorithms
regularly reduces the size of the problem space, leading to greater learning eο¬ciency
and improved generalisation [34]. In fact, it has been shown that a large majority of
neural network architectures can be described as stacking layers that deal with diο¬erent
symmetries [39]. The main methods used to integrate symmetries into a representation
are to build symmetries into the architecture of learning algorithms [40, 41], use data-
augmentation that encourages the model to learn symmetries [42, 43], or to adjust the
modelβs learning objective to encourage the representation to exhibit certain symme-
tries [44, 45]. The mathematical concept of symmetries can be abstracted to algebraic
structures called groups.
There are two main types of symmetries that are used in AI: invariant symmetries,
where a representation does not change when certain transformations are applied to
it, and equivariant symmetries, where the representation reο¬ects the symmetries of the
world. Historically, the learning of representations that are invariant to certain transfor-
mations has been a successful line of research [46, 47, 48, 49]. In building these invariant
representations, the agent eο¬ectively learns to ignore the invariant transformation since
the representation is unaο¬ected by the transformation. It has been suggested that this
approach can lead to more narrow intelligence, where an agent becomes good at solving
a small set of tasks but struggles with data eο¬ciency and generalisation when tasked
with new learning problems [50, 51]. Instead of ignoring certain transformations, the
equivariant approach attempts to preserve symmetry transformations in the agentβs rep-
resentation in such a way that the symmetry transformations of the representation match
the symmetry transformations of the world. It has been hypothesised that the equiv-
ariant approach is likely to produce representations that can be reused to solve a more
3
diverse range of tasks since no transformations are discarded [34]. Equivariant symmetry
approaches are commonly linked with disentangling representations [18], in which the
agentβs representation is separated into subspaces that are invariant to diο¬erent trans-
formations. Disentangled representation learning, which aims to produce representations
that separate the underlying structure of the world into disjoint parts, has been shown
to improve the data eο¬ciency of learning [52, 53].
Inspired by their use in Physics, symmetry-based disentangled representations (SB-
DRs) were proposed by [1] as a formal mathematical deο¬nition of disentangled represen-
tations. SBDRs are built on the assumption that the symmetries of the world state are
important aspects of that world state that need to be preserved in an agentβs internal rep-
resentation (i.e., the symmetries that are present in the world state should also be present
in the agentβs internal representation state). They describe symmetries of the world state
as βtransformations that change only some properties of the underlying world state, while
leaving all other properties invariantβ [1, page 1]. For example, the π¦-coordinate of an
agent moving parallel to the π₯-axis on a 2D Euclidean plane does not change. SBDRL has
gained traction in AI in recent years [54, 55, 56, 57, 58, 59, 60, 61, 25, 62, 63]. However,
symmetry-based disentangled representation learning (SBDRL) only considers actions
that form groups and so cannot take into account, for example, irreversible actions [1].
[25] showed that a symmetry-based representation cannot be learned using only a train-
ing set composed of the observations of world states; instead, a training set composed
of transitions between world states, as well as the observations of the world states, is
required. In other words, SBDRL requires the agent to interact with the world. This is
in agreement with the empirically proven idea that active sensing of a world can be used
to make non-invertible mappings from the world state to the representation state into
invertible mappings [64], and gives mathematical credence to SBDRL.
We agree with [1] that symmetry transformations are important structures to include
in an agentβs representation, but want to take their work one step further: we posit that
the relationships of transformations of the world due to the actions of the agent should
be included in the agentβs representation of the world. We will show that only including
transformations of the actions of an agent that form groups in an agentβs representation
of a world would lose important information about the world, since we demonstrate that
features of many worlds cause transformations due to the actions of an agent to not form
group structures. We generalise some important results, which have been put forward
for worlds with transformations of the actions of an agent that form groups, to worlds
with transformations of the actions of an agent that do not form groups. We believe that
including the relationships of these transformations in the agentβs representation has the
potential for powerful learning generalisation properties: (a) Take the following thought
experiment: Consider an agent that has learned the structure of the transformations due
to its actions of a world π. Now consider a world πβ², which is identical to world πin
every way except observations of the state of the world from a collection of the agentβs
sensors are shifted by a constant amount π(if the sensors were light sensors then πand
πβ²would only be diο¬erent in colour). So the observations of πβ²in state π β²would be given
by ππ β²=ππ +π, where ππ β²is the observation of πβ²in state π β², and ππ is the observation of
πin the state π . The relationship between the transformations of the world due to the
agentβs actions is the same in both worlds but the observations are diο¬erent. Therefore,
the agent only needs to learn how to adjust for the shift πin the data from some of its
sensors to be able to learn a good representation of πβ². In fact, the agent might have
4
already learned that the transformations of the world due to particular actions cause
relative changes to certain sensor values rather than depending on the raw sensor values.
(b) If the agent completely learns the structure of the world from one state π then it
knows it from all states of the world. By taking any sequence of actions that go from π
to some new state and then applying that sequence of actions to the relations that the
agent possesses for state π , the agent can produce the relationship between actions from
the new state (the relationship between the actions is dependent on the current state,
similar to how local symmetries in Physics are dependent on space-time coordinates).
(c) Another form of generalisation could be due to the action algebra being independent
of the starting state; in other words, the relationship between transformations due to
the agentβs actions from one state is the same as from any other state in the world -
the relationships between actions have been generalised to every state in the world. This
would allow the agent to learn the eο¬ect of its actions faster since the relationship between
actions is the same in any state. (d) In a partially observable world, previously explored
relationships between actions could be extrapolated to unexplored areas. If these areas
have the same or similar structure to the relationships between actions then the agent
could generalise what it has learned previously to unexplored parts of the world.
We also believe that a general framework for exploring the algebra of the transfor-
mations of worlds containing an agent, as proposed in this paper, has the potential to
be used as a tool in the ο¬eld of explainable AI. With such a framework, we are able to
predict which algebraic structures should appear in the agentβs representation at the end
of the learning process. Being able to predict the structures that should be present in an
agentβs representation in certain worlds and using certain learning algorithms would be
a powerful explanatory tool. For example, if there is a sharp improvement in an agentβs
performance at a task at a certain point of learning it could be the case that certain alge-
braic structures are present in the agentβs representation after the sharp improvement in
performance that were not present before the sharp improvement; if so, then it could be
argued that the sharp increase in performance is due to the βdiscoveryβ of the algebraic
structure in the agentβs representation.
We aim to help answer the question of which features should be present in a βgoodβ
representation by, as suggested by [1], looking at the transformation properties of worlds.
However, while [1] only considered symmetry transformations, we aim to go further and
consider the full algebra of various worlds. We propose a mathematical framework to
describe the transformations of the world and therefore describe the features we expect
to ο¬nd in the representation of an artiο¬cial agent. We derive [1]βs SBDRs using our
framework; we then use category theory to generalise elements of their work, namely
their equivariance condition and their deο¬nition of disentangling, to worlds where the
transformations of the world cannot be described using [1]βs framework. This paper aims
to make theoretical contributions to representation learning and does not propose new
learning algorithms. More speciο¬cally, our contributions are the following:
1. We propose a general mathematical framework for describing the transformation
of worlds due to the actions of an agent that can interact with the world.
2. We derive the SBDRs proposed by [1] from our framework and in doing so identify
the limitations of SBDRs in their current form.
3. We use our framework to explore the structure of the transformations of worlds
5
for classes of worlds containing features found in common reinforcement learning
scenarios. Our contributions are to the ο¬eld of representation learning and not
the ο¬eld of reinforcement learning. We also present the code used to work out the
algebra of the transformations of worlds due to the actions of an agent.
4. We generalise the equivariance condition and the deο¬nition of disentangling given
by [1] to worlds that do not satisfy the conditions for SBDRs. This generalisation
is performed using category theory.
This paper is structured as follows: In Section 2 we deο¬ne our framework and then
describe how it deals with generalised worlds, which consist of distinct world states
connected by transitions that describe the dynamics of the world. We deο¬ne transitions
and some of their relevant properties. Then we deο¬ne the actions of an agent in our
framework. In Section 3 our framework is used to reproduce the SBDRL framework
given by [1]. This is achieved by deο¬ning an equivalence relation that makes the actions
of an agent equivalent if the actions produce the same outcome if performed while the
world is in any state. In Section 4, we apply our framework to worlds exhibiting common
reinforcement learning scenarios that cannot be described fully using SBDRs and study
the algebraic structures exhibited by the dynamics of these worlds. In Section 5, we
generalise two important results of [1] - the equivariance condition and the disentangling
deο¬nition - to worlds with transformations with algebras that do not ο¬t into the SBDRL
paradigm. We ο¬nish with a discussion in Section 6.
2. A mathematical framework for an agent in world
We now introduce our general mathematical framework for formally describing the
transformations of a world. We begin by considering a generalised discrete world con-
sisting of a set of discrete states, which are distinguishable in some way, and a set of
transitions between those states; these transitions are the world dynamics (i.e., how the
world can transform from one world state to another world state). This world can be
represented by directed multigraph, where the world states are the vertices of the graph
and the world dynamics are arrows between the vertices. We will use this framework to
reproduce the group action structure of the dynamics of a world in an artiο¬cial agentβs
representation as described by [1], and in doing so we uncover the requirements for this
group action structure to be present in a world.
2.1. Simpliο¬cations
Fully observable and partially observable worlds. Each observation of the world could
contain all the information about the state of the distribution at the instance the obser-
vation is taken - the world is fully observable - or only some of the information about the
state of the distribution at the instance the observation is taken - the world is partially
observable. Partially observable worlds are common in many AI problems. However,
in this work, we explore the structure of representations of agents in fully observable
worlds. Our focus is on fully observable worlds because (1) their treatment is simpler
than partially observable worlds, making them a good starting point, and (2) the ideal
end result for a world being explored through partial observability should be as close as
possible to the end result for the same world being explored through full observability so
6
in identifying structures that should be present in the representations of fully observable
worlds we are also identifying structures that should ideally be present in the represen-
tations of fully observable worlds without having to consider the complications of partial
observability.
Discrete vs continuous worlds. For simplicity, we only consider worlds made of discrete
states. However, we argue that it is actually more natural to consider discrete worlds.
Consider an agent with πsensors ππthat can interact with a world πusing continuous
actions π΄to produce a sensor observation: ππ(π€)=ππwhere πβ {1, ..., π}and π€βπ. We
hypothesise that there will be actions ππ βπ΄that cause changes in the world state that
are so small that the agentβs sensors will not perceive the change (i.e.,ππ(ππ βπ€)=ππ).
Therefore, there will be discrete jumps between perceptible states of the world in the
agentβs representation.
2.2. Mathematical model of a world
We consider a world as a set of discrete world states, which are distinguishable in
some way, and a set of world state transitions between those states; these world state
transitions are the transformations of the world. (i.e., how a world state can transform
into another world state). We point out that this description of a world is isomorphic
to a directed multigraph, where the world states are the vertices of the graph and the
world state transitions are arrows between the vertices.
2.2.1. World states and world state transitions
We believe that deο¬ning a world as a discrete set of world states with world state
transitions between them is the most general deο¬nition of a world, and so take it as our
starting point from which we will build towards deο¬ning the algebra of the actions of an
agent. We are going to use these transitions to deο¬ne the actions of an agent.
Transitions. We consider a directed multigraph π²=(π, Λ
π·, π , π‘)where πis a set of
world states,Λ
π·is a set of minimum world state transitions, and π , π‘ :Λ
π·βπ;π is called
the source map and π‘is called the target map. For the remainder of the paper, we ο¬x
such a (π, Λ
π·, π , π‘).π²is called a world.
Minimum world state transitions are extended into a set π·of paths called world state
transitions: a path is a sequence of minimum world state transitions π=Λ
ππβ¦Λ
ππβ1β¦...β¦Λ
π1
such that π‘(Λ
ππ)=π (Λ
ππ+1)for π=1, ..., π β1. We extend π , π‘ to π·as π (π)=π (Λ
π1), and
π‘(π)=π‘(Λ
ππ). We also extend the composition operator β¦to π·such that ππβ¦ππβ1β¦... β¦π1
is deο¬ned if π‘(ππ)=π (ππ+1)for π=1, ..., π β1. For πβπ·with π (π)=π€and π‘(π)=π€β², we
will often denote πby π:π€βπ€β².
For the rest of the paper, we assume that, for each world state π€βπ, there is a
unique trivial world state transition 1π€βΛ
π·with π (1π€)=π‘(1π€); the trivial transition 1π€
is associated with the world being in state π€and then no change occurring due to the
transition 1π€.
Connected and disconnected worlds. We now introduce connected and disconnected worlds.
Simply, a world π΄is connected to a world π΅if there is a transition from a world state
in world π΄to a world state in world π΅. The concepts of connected and disconnected
worlds are necessary for generality; we are only interested in the perspective of the agent
7
and so only care about the world states and transitions that the agent can come into
contact with. Connected and disconnected worlds give us the language to describe and
then disregard the parts of worlds that the agent will never explore and therefore are not
relevant to the agentβs representation. For example, if an agent is in a maze and a section
of the maze is inaccessible from the position that the agent is in, then that section of the
maze would be disconnected from the section of the maze that the agent is in; if we want
to study how the agentβs representation evolves as it learns, it makes sense to disregard
the disconnected section of the maze since the agent never comes into contact with it
and so the disconnected section of the maze will not aο¬ect the agentβs representation.
Formally, we ο¬rst deο¬ne a sub-world πβ²of a world πas a subset πβ²βπalong with
π·β²={πβπ·|π (π) β πβ²and π‘(π) β πβ²}. Note that a sub-world is a world. A sub-world
πis connected to a sub-world πβ²if there exists a transition π:π€βπ€β²where π€βπ
and π€β²βπβ²; if no such transition exists, then πis disconnected from πβ². Similarly, a
world state π€is connected to a sub-world πif there exists a transition π:π€βπ€β²where
π€β²βπβ²; if no such transition exists, then π€is disconnected from πβ².
Eο¬ect of transitions on world states. We deο¬ne βas a partial function π·Γπβπby
πβπ€=π€β²where π:π€βπ€β²and undeο¬ned otherwise.
2.2.2. Example
We consider a cyclical 2 Γ2 grid world, denoted by π²
π, containing an agent as shown
in Figure 1. The transformations of π²
πare due to an agent moving either up (π), down
(π·), left (πΏ), right (π
), or doing nothing (1). The possible world states of π²
πare shown
in Figure 1. π²
π, and variations of it, is used as a running example to illustrate the
concepts presented in this paper.
We say the world being cyclical means that if the agent performs the same action
enough times, then the agent will return to its starting position; for example, for the
world π²
πif the agent performs the action πtwice when the world is in state π€0in
Figure 1 then the world will transition into the state π€0(i.e., π2βπ€0=π€0). The
transition due to performing each action in each state can be found in Table 1.
1π π· πΏ π
π€0π€0π€2π€2π€1π€1
π€1π€1π€3π€3π€0π€0
π€2π€2π€0π€0π€3π€3
π€3π€3π€1π€1π€2π€2
Table 1: Each entry in this table shows the outcome state of the agent performing the action given in
the column label when in the world state given by the row label.
The transitions shown in Table 1 can be represented as the transition diagram given in
Figure 2. It should be noted that, since the structure of the diagram is wholly dependent
on the arrows between the world states, the positioning of the world states is an arbitrary
choice.
2.3. Agents
We now deο¬ne the actions of agents as collections of transitions that are labelled by
their associated action. We then deο¬ne some relevant properties of these actions.
8
(a) π€0(b) π€1
(c) π€2(d) π€3
Figure 1: The world states of a cyclical 2 Γ2 grid world ππ, where changes to the world are due to an
agent moving either up, down, left, or right. The position of the agent in the world is represented by
the position of the circled A.
Figure 2: A transition diagram for the transitions shown in Table 1.
9
2.3.1. Treatment of agents
We now discuss how agents are treated in our model. We consider worlds containing
an embodied agent, which is able to interact with the environment by performing actions.
The end goal of the agentβs learning process is to map the useful aspects of the structure
of the world to the structure of its representation; the useful aspects are those that enable
the agent to complete whatever task it has.
We use the treatment of agents adopted by [1]. The agent has an unspeciο¬ed number
of unspeciο¬ed sensory states that allow it to make observations of the state of the envi-
ronment. Information about the world state that the agent is currently in is delivered to
the internal state of the agent through the sensory states. Mathematically, the process
of information propagating through the sensory states is a mapping π:πβπ(the
βobservation processβ), which produces a set of observations π1containing a single obser-
vation for each sensory state. These observations are then used by an inference process
β:πβπto produce an internal representation. The agent then uses some internal
mechanism to select an action to perform; this action is performed through active states.
The agent can be thought of as having a (non-physical) boundary between its internal
state, containing its internal state representation(s) of the world, and the external world
state. Information about the world state is accessed by the internal state through sensory
states only (the observation process). The agent aο¬ects the world using active states.
It is important to note that the agentβs state representation only reο¬ects the ob-
servations the agent makes with its sensors; in other words, the agentβs internal state
is built using the information about aspects of the world state propagated through its
sensory states, in the form that the observation process provides the information, and
not directly from the world state. For example, the human eye (the sensor) converts
information about the light entering the eye into electrical and chemical energy in the
optic nerve (the observation process) and then the information from the optic nerve is
given to the brain for inference. Information could be lost or modiο¬ed during the ob-
servation process. For example, light sensors could only pick up certain wavelengths of
light or take in a colour spectrum and output in greyscale. This means that world states
with diο¬erences that are not detectable by the agentβs sensors are eο¬ectively identical
from the agentβs perspective and so can be treated as such when we are constructing a
mathematical description of the agentβs interaction with the world.
2.3.2. Actions of an agent as labelled transitions
Consider a set Λ
π΄called the set of minimum actions. Let the set π΄be the set of
all ο¬nite sequences formed from the elements of the set Λ
π΄; we call π΄the set of actions.
Consider a set Λ
π·π΄βπ·, where 1π€βΛ
π·π΄for all π€βπ; we call Λ
π·π΄the set of minimum
action transitions. We consider a labelling map Λ
π:Λ
π·π΄βΛ
π΄such that:
1. Any two distinct transitions leaving the same world state are labelled with diο¬erent
actions.
Action condition 1. For any π, πβ²βΛ
π·π΄with π (π)=π (πβ²),Λ
π(π)β Λ
π(πβ²).
2. There is an identity action that leaves any world state unchanged.
Action condition 2. There exists an action 1βΛ
π΄such that Λ
π(1π€)=1for all
π€βπ. We call 1the identity action.
10
Figure 3: Labelling the transitions in Figure 2 with the relevant actions in π΄.
Given Λ
π·π΄as deο¬ned above and satisfying action condition 1 and action condition 2,
we deο¬ne ππ·π΄=(π, Λ
π·π΄, π π΄, π‘π΄), where π π΄, π‘π΄are the restrictions of π , π‘ to the set Λ
π·π΄.
We now deο¬ne a set π·π΄, the set of action transitions, which is the set of all paths of
ππ·π΄.
We extend the map Λ
πto a map π:π·π΄βπ΄such that if π=Λ
ππβ¦... β¦Λ
π1then
π(π)=Λ
π(Λ
ππ)...Λ
π(Λ
π1). For πβπ·π΄with π (π)=π€,π‘(π)=π€β²and π(π)=π, we will often
denote πby π:π€π
ββ π€β².
If an action πβπ΄is expressed in terms of its minimum actions as π=Λ
ππβ¦... β¦Λ
π1,
then π=π(π)=π(Λ
ππβ¦... β¦Λ
π1)=Λ
π(Λ
ππ) β¦ ... β¦Λ
π(Λ
π1)=Λ
ππβ¦... β¦Λ
π1, where the Λ
ππare called
minimum actions.
Remark 2.3.1. For a given π€βπ, we can label transitions in π·π΄with an appropriate
element of π΄through the following: for each πβπ·π΄with π (π)=π€, express πin terms
of its minimum transitions in π·π΄as π=ππβ¦... β¦π2β¦π1; if Λ
π(ππ)=ππthen πis labelled
with ππ...π2π1βπ΄. We denote the map that performs this labelling by π:π·π΄βπ΄.
Figure 3 shows how transitions are labelled with actions in our 2 Γ2 cyclical world
example. We only show the minimum actions for simplicity but there are actually inο¬nite
action transitions between each pair of world states; for example, the action transitions
from π€0to π€1include those labelled by: π·β¦π
,π·β¦π
β¦1π(πβN), 1πβ¦π·β¦π
(πβN),
π·β¦π
β¦ (πΏβ¦π
)π(πβN) etc...
Eο¬ect of actions on world states. We deο¬ne the eο¬ect of the action πβπ΄on world state
π€βπas the following: if there exists πβπ·π΄such that π (π)=π€and π(π)=π, then
πβπ€=π‘(π); if there does not exist πβπ·π΄such that π (π)=π€and π(π)=π, then we say
that πβπ€is undeο¬ned. The eο¬ect of actions on world states is well-deο¬ned due to action
condition 1. We can apply the minimum actions that make up an action to world states
individually: if πβπ€is deο¬ned and π=Λ
ππ...Λ
π1then πβπ€=(Λ
ππ...Λ
π1) β π€=Λ
ππ...Λ
π2β ( Λ
π1βπ€).
11
Physically, the identity action 1 βπ΄corresponds to the no-op action (i.e., the world state
does not change due to this action).
Actions as (partial) functions. Consider all the transitions that are labelled by a partic-
ular action πβπ΄. Together these transitions form a partial function ππ:πβπbecause
for any π€βπeither πβπ€is undeο¬ned or πβπ€is deο¬ned and there is a unique world
state π€β²βπfor which πβπ€=π€β²(due to condition 1). ππis not generally surjective
because for a given π€βπthere is not necessarily a transition πβπ·with π(π)=π
and π‘(π)=π€.ππis not generally injective because it is possible to have an environment
where ππ(π€)=ππ(π€β²)for some π€βπdiο¬erent from π€β²βπ. We can also reproduce
these functions using the formalism given by [25], which describes the dynamics of the
world in terms of a multivariate function π:π΄Γπβπ. If we let π:π΄Γπβπ
be the dynamics of the environment then the transition caused by an action πβπ΄on
a world state π€βπ(where πβπ€is deο¬ned) is given by (π, π€) β¦β π(π, π€)=πβπ€.
Mathematically, we curry the function π:π΄Γπβπto give a collection {ππ}of par-
tial functions with a partial function Λ
π(π)=ππ:πβπfor each action πβπ΄as
Curry :(π:π΄Γπβπ) β ( ππ:πβπ).
3. Reproducing SBDRL
We will now use the framework set out in the previous section to reproduce the SB-
DRs of [1]. We illustrate our ideas using the worlds that are similar to those given by
[1] and [25]. We choose to begin by reproducing symmetry-based representations using
our framework because (1) symmetry-based representations describe transformations of
the world that have formed relatively simple and well-understood algebraic structures
(groups), (2) groups, and the symmetries they describe, are gaining increasing promi-
nence in artiο¬cial intelligence research, (3) it shows how our framework encompasses
previous work in formalising the structure of transformations of a world, and (4) it
provides a more rigorous description of SBDRL, which should aid future analysis and
development of the concept.
Section 3.1 provides a description of SBDRL, and Section 3.2 shows how to get to
SBDRL using an equivalence relation on the actions of the agent. Section 3.3 provides
information on the algorithmic exploration of world structures performed on example
worlds. Section 3.3.1 goes through a worked example. Finally, Section 3.4 shows the
conditions of the world that are required for the actions of an agent to be fully described
by SBDRs.
3.1. Symmetry-based disentangled representation learning
We will now present a more detailed description of the SBDRL formalism.
From world states to representation states. The world state is an element of a set π
of all possible world states. The observations of a particular world state made by the
agentβs sensors are elements of the set πof all possible observations. The agentβs internal
state representation of the world state is an element of a set πof all possible internal
state representations. There exists a composite mapping π=ββ¦π:πβπthat maps
world states to states of the agentβs representation (π€β¦β π§); this composite mapping is
made up of the mapping of an observation process π:πβπthat maps world states
12
Figure 4: The composite mapping from the set πof world states to the set πof state representations
via the set πof observations.
to observations (π€β¦β π) and the mapping of an inference process β:πβπthat maps
observations to the agentβs internal state representation (πβ¦β π§) (see Figure 4).
Groups and symmetries.
Deο¬nition 3.1 (Group).A group πΊis a set with a binary operation πΊΓπΊβπΊ,
(π, πβ²) β¦β πβ¦πβ²called the composition of group elements that satisο¬es the following
properties:
1. Closure. πβ¦πβ²is deο¬ned for all π, πβ²βπΊ.
2. Associative. (πβ¦πβ²) β¦ πβ²β² =πβ¦ (πβ²β¦πβ²β² )for all π, πβ², πβ²β² βπΊ.
3. Identity. There exists a unique identity element 1βπΊsuch that 1β¦π=πβ¦1=π
for all πβπΊ.
4. Inverse. For any πβπΊ, there exists πβ1βπΊsuch that πβ¦πβ1=πβ1β¦π=1.
Applying symmetries to objects is mathematically deο¬ned as a group action.
Deο¬nition 3.2 (Group action).Given a group πΊand a set π, a group action of πΊon
πis a map πΊΓπβπ,(π, π₯) β¦β πβπ₯that satisο¬es the following properties:
1. Compatibility with composition. The composition of group elements and the group
action are compatible: πβ²β¦ ( πβπ₯)=(πβ²β¦π) β π₯for π, πβ²βπΊand π₯βπ.
2. Identity. The group identity 1βπΊleaves the elements of πunchanged: 1βπ₯=π₯
for all π₯βπ.
Another important property of groups is commutation. Two elements of a group
commute if the order they are composed does not matter: πβ¦πβ²=πβ²β¦π. If all elements
in a group commute with each other then the group is called commutative. Subgroups
of a group might commute with each other.
13
Symmetry-based representations. The set πof world states has a set of symmetries that
are described by the group πΊ. This group πΊacts on the set πof world states via a
group action Β·π:πΊΓπβπ. For the agentβs representations π§πβπto be symmetry-
based representations, a corresponding group action Β·π:πΊΓπβπmust be found so
that the symmetries of the agentβs representations reο¬ect the symmetries of the world
states. The mathematical condition for this is that, for all π€βπand all πβπΊ, applying
the action πΒ·πto π€and then applying the mapping πgives the same result as ο¬rst
applying the mapping πto π€to give π(π€)and then applying the action πΒ·πto π(π€).
Mathematically, this is π(πΒ·ππ€)=πΒ·ππ(π€). If this condition is satisο¬ed, then πis
called a group-equivariant map.
Symmetry-based disentangled representations. To go from symmetry-based representa-
tions to symmetry-based disentangled representations, suppose the group of symmetries
πΊof the set πof world states decomposes as a direct product πΊ=πΊ1Γ. . . ΓπΊπΓ. . . ΓπΊπ.
The group action Β·π:πΊΓπβπand the set πare disentangled with respect to the
decomposition of πΊ, if there is a decomposition π=π1Γ. . . ΓππΓ. . . Γππand actions
Β·ππ:πΊπΓππβππ, π β {1, . . . , π }such that (ππΊ1, ππΊ2) Β·π(π§π1, π§ π2)=(ππΊ1Β·π1π§π1, ππΊ2Β·π2π§π2),
where ππΊπβπΊπand π§ππβππ. In other words, each subspace ππis invariant to the action
of all the πΊπβ πand only aο¬ected by πΊπ.
Summary. The representations in πare symmetry-based disentangled with respect to
the decomposition πΊ=πΊ1Γ. . . ΓπΊπΓ. . . ΓπΊπ, where each πΊπacts on a disjoint part of
π, if:
1. There exists a group action Β·π:πΊΓπβπand a corresponding group action
Β·π:πΊΓπβπ;
2. The map π:πβπis group-equivariant between the group actions on πand π:
πΒ·ππ(π€)=π(πΒ·ππ€). In other words, the diagram
π€ π Β·ππ€
π(π€)πΒ·ππ(π€)=π(πΒ·ππ€)
πΒ·π
ππ
πΒ·π
commutes.
3. There exists a decomposition of the representation π=π1Γ. . . Γππsuch that each
subspace ππis unaο¬ected by the action for all πΊπβ πand is only aο¬ected by πΊπ.
Limitations of SBDRL. Both [1] and [25] suggest that these group actions can be used
to describe some types of real-world actions. However, it is important to note that they
do not believe that all actions can be described by their formalism: βIt is important to
mention that not all actions are symmetries, for instance, the action of eating a collectible
item in the environment is not part of any group of symmetries of the environment because
it might be irreversible.β [25, page 4].
14
3.2. SBDRL through equivalence
For the algebra of the actions of our agent to form a group, we need some sense of
actions being the same so that the algebra can satisfy the group properties (e.g., for the
identity property we need an element 1 in the algebra π΄such that 1π=π1=πfor any
πβπ΄). We deο¬ne an equivalence relation on the elements of π΄that says two actions
are equivalent (our sense of the actions being the same) if they lead to the same end
world state when performed in any initial world state. This equivalence relation is based
on our mathematical interpretation of the implication given by [1] that transformations
of the world are the same if they have the same eο¬ect, which is used to achieve the
group structure for SBDRL. Our use of an equivalence relation was inspired by [65],
which uses a similar equivalence relation to equate action sequences that cause the same
ο¬nal observation state after each action sequence is performed from an initial observation
state. We then derive some properties of the equivalence classes created by βΌthat will be
used to show that the actions of an agent form the group action described by [1] under
the equivalence relations we deο¬ne and for worlds satisfying certain conditions.
Deο¬nition 3.3 (Equivalence of actions under βΌ).Given two actions π, πβ²βπ΄, we denote
πβΌπβ²if πβπ€=πβ²βπ€for all π€βπ.
Remark 3.2.1. If πβΌπβ², then either for each π€βπ(1) there exists transitions
π:π€π
ββ π‘(π)and πβ²:π€π
ββ π‘(π)or (2) there exists no transitions π:π€π
ββ π‘(π)or
πβ²:π€π
ββ π‘(π).
Proposition 3.1. βΌis an equivalence relation.
Proof. Reο¬exive. If πβΌπβ²then πβπ€=πβπ€β²for all π€βπ, and so πβΌπ.
Transitive. If πβΌπβ²and πβ²βΌπβ²β², then πβπ€=πβ²βπ€for all π€βπand πβ²βπ€=πβ²β² βπ€
for all π€βπ. Therefore, πβπ€βΌπβ²β² βπ€for all π€βπand so πβΌπβ²β² .
Symmetric. If πβΌπβ², then πβπ€=πβ²βπ€for all π€βπ. Therefore πβ²βπ€=πβπ€for
all π€βπ, and so πβ²βΌπ.β‘
Figure 5 shows the eο¬ect of applying the equivalence relations to our 2 Γ2 cyclical
example world π²
π.
We deο¬ne the canonical projection map ππ΄:π΄βπ΄/βΌ that sends actions in π΄to
their equivalence classes under βΌin the set π΄/βΌ. We denote the equivalence class of π
by [π]βΌ. Sometimes we will drop the [π]βΌin favour of πβπ΄/βΌ for ease.
Composition of actions. We deο¬ne the composition of elements in π΄/βΌ as β¦:(π΄/βΌ) Γ( π΄/βΌ
)β(π΄/βΌ) such that [πβ²]βΌβ¦ [π]βΌ=[πβ²β¦π]βΌfor π, πβ²βπ΄.
Proposition 3.2. [πβ²]βΌβ¦ [π]βΌ=[πβ²β¦π]βΌis well-deο¬ned for all π, πβ²βπ΄.
Proof. We need to show that the choice of π, πβ²doesnβt matter: if π1βΌπ2and π3βΌπ4
for π1, π2, π3, π4βπ΄, then [π3β¦π1]βΌ=[π4β¦π2]βΌ.π1βΌπ2means there exists ππ:
π (π1)ππ
ββ π‘(π1)for π=1,2. Since actions are unrestricted in π, for any world state and
any action there is a transition with a source at that world state that is labelled by that
action. π3βΌπ4means there exists ππ:π (π3)ππ
βββ π‘(π3)for π=3,4, and so there exists
ππ:π‘(π1)ππ
βββ π‘(π3)for π=3,4. =βthere exists (π3β¦π1):π (π1)π3β¦π1
ββββββ π‘(π3)and
(π4β¦π2):π (π1)π4β¦π2
ββββββ π‘(π3).=βπ (π3β¦π1)=π (π4β¦π2)and π‘(π3β¦π1)=π (π4β¦π2).=β
(π3β¦π1)βΌ(π4β¦π2).=β [π3β¦π1)]βΌ=[π4β¦π2]βΌ.β‘
15
Figure 5: Action equivalence classes in π΄/βΌ for the actions show in Figure 3.
Eο¬ect of equivalent actions on world states. We deο¬ne the eο¬ect of an element of π΄/βΌ
on world states as β:(π΄/βΌ) Γ πβπsuch that [π]βΌβπ€=πβπ€. Note that this is only
deο¬ned if there exists π:π€π
ββ π‘(π)for πβπ·π΄.; if not, then [π]βΌβπ€is called undeο¬ned.
Proposition 3.3. [π]βΌβπ€is well-deο¬ned for all πβπ΄and for all π€βπ.
Proof. We need to show that π1βπ€=π2βπ€if [π1]βΌ=[π2]βΌfor π1, π2βπ΄and π€βπ.
If [π1]βΌ=[π2]βΌ, then π1βΌπ2. Since actions are unrestricted in π, for any world state
and any action there is a transition with a source at that world state is are labelled by
that action. =βthere exists ππ:π€ππ
ββ π‘(π1)for π=1,2. =βπ1βπ€=π1βπ€=π‘(π1)
and π2βπ€=π2βπ€=π‘(π1).β‘
Reversible actions. An action πβπ΄is called reversible in a given state π€βπif πβπ΄π
π€
where π΄π
π€ ={πβπ΄|there exists πβ²βπ΄such that πβ²β¦πβπ€=π€}. An action πβπ΄
is called reversible if it is reversible in every π€βπ. An action that is not reversible is
called irreversible.
Properties of the quotient set π΄/βΌ.
Proposition 3.4. (π΄/βΌ,β¦) has an identity element.
Proof. To show that (π΄/βΌ,β¦) has an identity element we can show that there is an element
πβπ΄which satisο¬es (a) [π]βΌβ¦ [π]βΌ=[π]βΌand (b) [π]βΌβ¦ [π]βΌ=[π]βΌfor all πβπ΄. We
will prove that the identity action 1 βπ΄satisο¬es the above condition. Consider any
transition π:π (π)π
ββ π‘(π)labelled by any action πβπ΄.
(a) There exists a transition 1π (π):π (π)1
ββ π (π)due to action condition 2. π‘(1π (π))=
π (π)=βπβ¦1 is deο¬ned for π.π (πβ¦1)=π (1)=π (π)=π (π)and π‘(πβ¦1)=π‘(π).=β
πβ¦1βΌπ.=β [ πβ¦1]βΌ=[π]βΌ.=β [π]βΌβ¦ [1]βΌ=[π]βΌ.
(b) There exists a transition 1π‘(π):π‘(π)1
ββ π‘(π)due to action condition 2. π‘(1π‘(π))=
π‘(π)=β1β¦πis deο¬ned for π.π (1β¦π)=π (π)and π‘(1β¦π)=π‘(1)=π‘(π).=β1β¦πβΌπ.
=β [1β¦π]βΌ=[π]βΌ.=β [1]βΌβ¦ [π]βΌ=[π]βΌ. Therefore 1 βπ΄satisο¬es the conditions for
[1]βΌbeing an identity element in (π΄/βΌ,β¦).β‘
16
Proposition 3.5. β¦is associative with respect to (π΄/βΌ,β¦).
Proof. For β¦to be associative we need [π1]βΌβ¦ ( [π2]βΌβ¦ [π3]βΌ)=([π1]βΌβ¦ [π2]βΌ) β¦ [π3]βΌfor
any π1, π2, π3βπ΄. We have π1β¦ (π2β¦π3)=(π1β¦π2) β¦ π3from the associativity of β¦with
respect to (π΄, β¦), and [πβ²]βΌβ¦ [π]βΌ=[πβ²β¦π]βΌfor any π, πβ²βπ΄by deο¬nition of β¦on π΄/βΌ.
=β [ π1]βΌβ¦ ( [π2]βΌβ¦ [π3]βΌ)=[π1β¦ (π2β¦π3)]βΌ=[ (π1β¦π2) β¦ π3]βΌ=[ (π1β¦π2) ]βΌβ¦ [π3]βΌ=
([π1]βΌβ¦ [π2]βΌ) β¦ [π3]βΌ.β‘
In summary, we have (π΄, β¦,β), which is a set π΄along with two operators β¦:π΄Γπ΄βπ΄
and β:π΄Γπβπ, and we have (π΄/βΌ,β¦,β), which is a set π΄/βΌ along with two operators
β¦:(π΄/βΌ) Γ ( π΄/βΌ) β ( π΄/βΌ) and β:(π΄/βΌ) Γ πβπ. We have shown that β¦is associative
with respect to (π΄/βΌ,β¦), and that (π΄/βΌ,β¦) has an identity element by action condition
2.
3.3. Algorithmic exploration of world structures
To gain an intuition for the structure of diο¬erent worlds and to illustrate our theo-
retical work with examples, we developed an algorithm that uses an agentβs minimum
actions to generate the algebraic structure of the transformations of a world due to the
agentβs actions. We display this structure as a generalised Cayley table (a multiplication
table for the distinct elements of the algebra). Implementation of this algorithm can be
found at github.com/awjdean/CayleyTableGeneration.
Cayley table generating algorithm. First, we generate what we call a state Cayley table
(Algorithm 1). The elements of this state Cayley table are the world state reached when
the row action and then the column action are performed in succession from an initial
world state π€(i.e.,πππ π’ππ π ππππ β (πππ€ π ππππ βπ€)). Once the state Cayley table had
been generated we can use it to generate the action Cayley table, in which the elements
of the table are the equivalent elements in the algebra if the agent performs the row
action followed by the column action (Algorithm 2).
17
Algorithm 1 Generate state Cayley table.
Require: ππππππ’π πππ‘ππππ : a list of minimum actions, π€: initial world state.
1: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ βan empty square matrix with dimensions
πππ(ππππππ’π πππ‘π πππ ) Γ π ππ(ππ ππππ’π πππ‘ππππ ), with rows and columns labelled by
ππππππ’π πππ‘ππππ .
2: for πin ππππππ’π πππ‘ππππ do
3: Create an equivalence class for π.
4: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ βAddElementToStateCayleyTable(π π‘ππ‘π ππ π¦π ππ¦ π‘ππππ,π€,
π). β²See Algorithm 3.
5: end for
6: for πππ€ π ππππ in π π‘ππ‘π ππ π¦π π π¦ π‘ ππ ππ do
7: πππ’ππ£ππ πππ‘π π ππ’ππ βSearchForEquivalents(π π‘ππ‘π π ππ¦πππ¦ π‘π ππ π,π€,π ππ€ ππ πππ).
β²See Algorithm 5.
8: if πππ(πππ’ππ£πππππ‘ π π ππ’ππ)β 0then
9: Merge the equivalence classes of equivalent minimum actions.
10: Delete the row and column from π π‘ππ‘π πππ¦ππ π¦ π‘ πππ π for the minimum actions
not labelling the merged equivalence class.
11: end if
12: end for
13: Initialize an empty list ππππππππ‘ π ππ π¦ππ π¦ π‘ππππ ππ πππππ‘ π .
14: πππππ πππ‘ π πππ¦πππ¦ π‘ ππππ ππ ππ πππ‘π βSearchForNewCandidates(π π‘ππ‘π ππ π¦π ππ¦ π‘π πππ,
π€,πππππ πππ‘ π πππ¦πππ¦ π‘ ππππ ππ ππ πππ‘ π ). β²See Algorithm 4.
15: while πππ(ππππππππ‘π ππ π¦ππ π¦ π‘ππππ πππππππ‘ π )>0do
16: ππΆβpop an element from πππππ πππ‘ π πππ¦ πππ¦ π‘ πππ π ππ πππππ‘π .
17: πππ’ππ£ππ πππ‘π π ππ’ππ βSearchForEquivalents(π π‘ππ‘π ππ π¦π π π¦ π‘ππππ,π€,ππΆ). β²See
Algorithm 5.
18: if πππ(πππ’ππ£πππππ‘ π π ππ’ππ)β 0then
19: Add ππΆto the relevant equivalence class.
20: Continue to the next iteration of the while loop.
21: else
22: Check if ππΆbreaks any of the existing equivalence classes.
23: ππππππ πππ’ππ£ ππ ππππ πππ π π ππ βSearchForBrokenEquivalenceClasses(π π‘ππ‘π πππ¦π ππ¦ π‘π ππ π,
π€,ππΆ). β²See Algorithm 6.
24: if πππ(ππππ ππ πππ’ππ£ππ ππππ ππ ππ π ππ )β 0then
25: for each new equivalence class do
26: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ βAddElementToStateCayleyTable(π π‘ππ‘π ππ π¦π ππ¦ π‘ππππ,
π€,element labelling new equivalence class). β²See Algorithm 3.
27: end for
28: end if
29: Create new equivalence class for ππΆ.
30: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ βAddElementToStateCayleyTable(π π‘ππ‘π ππ π¦π π π¦ π‘ππππ,
π€,ππΆ). β²See Algorithm 3.
31: end if
32: πππππ πππ‘ π πππ¦πππ¦ π‘ ππππ ππ ππ πππ‘π βSearchForNewCandidates(π π‘ππ‘π ππ π¦π π π¦ π‘ππππ,
π€,πππππ πππ‘ π πππ¦πππ¦ π‘ ππππ ππ ππ πππ‘ π ). β²See Algorithm 4.
33: end while
34: return π π‘ππ‘π ππ π¦π π π¦ π‘ππππ 18
Algorithm 2 Generate action Cayley table.
Require: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ.
1: πππ‘ πππ ππ π¦π ππ¦ π‘ππππ βan empty square matrix with the dimensions of
π π‘ππ‘π ππ π¦π π π¦ π‘ππππ, with rows and columns labelled by the rows and columns of
π π‘ππ‘π ππ π¦π π π¦ π‘ππππ.
2: for πππ€ π ππππ in π ππ‘πππ πππ¦πππ¦ π‘ πππ π do
3: for ππππ’ππ πππππ in π ππ‘πππ πππ¦ πππ¦ π‘ πππ π do
4: ππΆβππππ’ππ πππππ β¦π ππ€ π ππππ.
5: ππ π ππππ βlabel of equivalence class containing ππΆ.
6: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ [π ππ€ ππ πππ] [ππππ’ππ π ππππ ]=πππππ ππ .
7: end for
8: end for
9: return: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ.
Algorithm 3 AddElementToStateCayleyTable: Fill state Cayley table row and column
for element π.
Require: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ,π€: initial world state, π.
1: Add new row and new column labelled by πto π π‘ππ‘π ππ π¦π ππ¦ π‘ππππ.
2: for ππππ’ππ πππππ in π π‘ππ‘π π ππ¦π ππ¦ π‘ππππ do
3: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ [π] [ πππ π’ππ πππππ] β ππππ’ππ π ππππ β (πβπ€).
4: end for
5: for πππ€ π ππππ in π π‘ππ‘π ππ π¦π π π¦ π‘ ππ ππ do
6: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ [π ππ€ ππ πππ] [π] β πβ (πππ€ π ππππ βπ€).
7: end for
Algorithm 4 SearchForNewCandidates: Search for new candidate elements in state
Cayley table.
Require: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ,π€: initial world state, ππππππππ‘π ππ π¦π π π¦ π‘ππππ ππ ππ πππ‘ π .
1: for πππ€ π ππππ in π π‘ππ‘π ππ π¦π π π¦ π‘ ππ ππ do
2: for ππππ’ππ πππππ in π π‘ππ‘π π ππ¦π ππ¦ π‘ππππ do
3: ππΆβππππ’ππ πππππ β¦π ππ€ π ππππ.
4: πππ’ππ£ππ πππ‘π π ππ’ππ βSearchForEquivalents(π π‘ππ‘π ππ π¦π π π¦ π‘ππππ,π€,ππΆ). β²
See Algorithm 5.
5: if πππ(πππ’ππ£πππππ‘ π π ππ’ππ)β 0then
6: Add ππΆto relevant equivalence class.
7: else
8: Add ππΆto πππππ πππ‘ π πππ¦πππ¦ π‘ ππππ ππ ππ πππ‘ π .
9: end if
10: end for
11: end for
12: return πππππ πππ‘ π πππ¦πππ¦ π‘ ππππ ππ ππ πππ‘ π .
19
Algorithm 5 SearchForEquivalents: Search for elements in Cayley table that are equiv-
alent to π.
Require: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ,π€: initial world state, π.
1: πππ’ππ£ππ πππ‘π π ππ’ππ βempty list.
2: π π ππ€ βempty list. β²Generate state Cayley row for π.
3: for ππππ’ππ πππππ in π π‘ππ‘π π ππ¦π ππ¦ π‘ππππ do
4: if ππππ’ππ π ππππ == πthen
5: Continue to the next iteration of the for loop.
6: end if
7: Append ππππ’ππ πππ ππ β (πβπ€)to π πππ€.
8: end for
9: π ππππ’ππ βempty list. β²Generate state Cayley column for π.
10: for πππ€ π ππππ in π π‘ππ‘π ππ π¦π π π¦ π‘ ππ ππ do
11: if πππ€ π ππππ == πthen
12: Continue to the next iteration of the for loop.
13: end if
14: Append πβ (π ππ€ ππ πππ βπ€)to π ππππ’ππ.
15: end for
16: for πππ€ π ππππ in π π‘ππ‘π ππ π¦π π π¦ π‘ ππ ππ do
17: if (π πππ€,π πππ π’ππ)== (πππ€ π ππππ row, πππ€ πππππ column)then
18: Append π ππ€ π ππππ to πππ’ππ£ππ πππ‘π π ππ’ππ.
19: end if
20: end for
21: return πππ’ππ£ππ πππ‘π π ππ’ππ.
Algorithm 6 SearchForBrokenEquivalenceClasses: Find equivalence classes that are
broken by ππΆ.
Require: π π‘ππ‘π ππ π¦π π π¦ π‘ππππ,π€: initial world state, ππΆ.
1: for πππ€ π ππππ in π π‘ππ‘π ππ π¦π π π¦ π‘ ππ ππ do
2: ππ π ππππ ππ’π‘ππππ βπππ€ π ππππ β ( ππΆβπ€).
3: for ππ πππ ππππ‘ in equivalence class labelled by πππ€ ππ πππ do
4: if ππ π ππππ ππ’π‘ππππ β ππ πππππππ‘ β (ππΆβπ€)then
5: Create a new equivalence class labelled by ππ ππ ππ πππ‘ .
6: Remove ππ ππ ππ πππ‘ from the equivalence class labelled by πππ€ πππ ππ.
7: end if
8: end for
9: end for
10: return: New equivalence classes.
Displaying the algebra. We display the algebra in two ways: (1) a π€-state Cayley table,
which shows the resulting state of applying the row element to π€followed by the column
element (i.e., π€-state Cayley table value =column labelβ(row labelβπ€)), and (2) an action
Cayley table, which shows the resulting element of the algebra when the column element is
applied to the left of the row element (i.e.,action Cayley table value =column element β¦
row element).
20
π΄/βΌ 1π· πΏ π
π
1π€0π€2π€1π€3
π· π€2π€0π€3π€1
πΏ π€1π€3π€0π€2
π
π π€3π€1π€2π€0
Table 2: π€0state Cayley table for π΄/βΌ.
π΄/βΌ 1π· πΏ π
π
1 1 π· πΏ π
π
π· π· 1π
π πΏ
πΏ πΏ π
π 1π·
π
π π
π πΏ π· 1
Table 3: Action Cayley table for π΄/βΌ.
Algebra properties. We also check the following properties of the algebra algorithmically:
(1) the presence of identity, including the presence of left and right identity elements
separately, (2) the presence of inverses, including the presence of left and right inverses
for each element, (3) associativity, (4) commutativity, and (5) the order of each element
in the algebra. For our algorithm to successfully generate the algebra of a world, the
world must contain a ο¬nite number of states, the agent must have a ο¬nite number of
minimum actions, and all the transformations of the world must be due to the actions
of the agent.
3.3.1. Example
For our example world π²
π, the equivalence classes shown in Figure 5 - those labelled
by 1, π
, and π- are the only equivalence classes in π΄/βΌ. The π€-state Cayley table in
Table 2 shows the ο¬nal world state reached after the following operation: table entry =
column element β (row element βπ€).
The π€-action Cayley table in Table 3 shows the equivalent action in π΄/βΌ for the same
operation as the π€-state Cayley table: [table entry] βπ€=column elementβ(row elementβ
π€)for all π€βπ.
The choice of the equivalence class label in Table 4 is arbitrary; it is better to think
of each equivalence class as a distinct element as shown in the Cayley table in Table 5.
There are four elements in the action algebra, therefore, if the agent learns the rela-
tions between these four elements, and then it has complete knowledge of the transfor-
mations of our example world.
βΌequivalence class label βΌequivalence class elements
1 1,11, π· π·, πΏπΏ, π
π π
π, ...
π· π·, π· 1,1π·, π
ππΏ, πΏ π
π, ...
πΏ πΏ, πΏ1, π
π π·, 1πΏ, π· π
π, ...
π
π π
π, π
π1, πΏ π·, π· πΏ, 1π
π, ...
Table 4: Action Cayley table equivalence classes.
21
Property Present?
Totality Y
Identity Y
Inverse Y
Associative Y
Commutative Y
Table 6: Properties of the π΄/βΌ algebra.
Element Order
1 1
π·2
πΏ2
π
π 2
Table 7: Order of elements in π΄/βΌ.
π΄/βΌ 1234
1 1234
2 2143
3 3412
4 4321
Table 5: Abstract action Cayley table for π΄/βΌ.
Properties of π΄/βΌ algebra. The properties of the π΄/βΌ algebra are displayed in Table 6
and show that π΄/βΌ is a commutative group, where the no-op action is the identity, and
all elements are their own inverses. Since the action algebra of our example world is a
group, it can be described by SBDRL. The order of each element is given by Table 7.
3.4. Conditions for SBDRL to apply
To simplify the problem, we only consider worlds where the transformations of the
world are only due to the actions of an agent for the remainder of this paper unless
otherwise stated; this means we do not have to take into consideration how the agent
would deal with transformations of the world, not due to its actions. Therefore, we will
only consider worlds with π·=π·π΄.
To be a group, π΄/βΌ must satisfy the properties of (1) identity, (2) associativity, (3)
closure, and (4) inverse. (1) The identity property and (2) the associativity property are
satisο¬ed by Proposition 3.4 and Proposition 3.5 respectively. (3) For the closure property
to be satisο¬ed, the following condition is suο¬cient:
World condition 1 (Unrestricted actions).For any action πβπ΄and for any world
state π€βπ,