PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

In this paper, we propose a framework to extract the algebra of the transformations of worlds from the perspective of an agent. As a starting point, we use our framework to reproduce the symmetry-based representations from the symmetry-based disentangled representation learning (SBDRL) formalism proposed by [1]; only the algebra of transformations of worlds that form groups can be described using symmetry-based representations. We then study the algebras of the transformations of worlds with features that occur in simple reinforcement learning scenarios. Using computational methods, that we developed , we extract the algebras of the transformations of these worlds and classify them according to their properties. Finally, we generalise two important results of SBDRL-the equivariance condition and the disentangling definition-from only working with symmetry-based representations to working with representations capturing the transformation properties of worlds with transformations for any algebra. Finally, we combine our generalised equivari-ance condition and our generalised disentangling definition to show that disentangled sub-algebras can each have their own individual equivariance conditions, which can be treated independently.
Content may be subject to copyright.
Algebras of actions in an agent’s representations of the world
Alexander Deana,βˆ—, Eduardo Alonsoa, Esther MondragΒ΄ona
aArtificial Intelligence Research Centre (CitAI), Department of Computer Science,
City, University of London, Northampton Square, EC1V 0HB, London, UK
Abstract
In this paper, we propose a framework to extract the algebra of the transformations of
worlds from the perspective of an agent. As a starting point, we use our framework
to reproduce the symmetry-based representations from the symmetry-based disentan-
gled representation learning (SBDRL) formalism proposed by [1]; only the algebra of
transformations of worlds that form groups can be described using symmetry-based rep-
resentations.
We then study the algebras of the transformations of worlds with features that occur
in simple reinforcement learning scenarios. Using computational methods, that we de-
veloped, we extract the algebras of the transformations of these worlds and classify them
according to their properties.
Finally, we generalise two important results of SBDRL - the equivariance condition
and the disentangling definition - from only working with symmetry-based representa-
tions to working with representations capturing the transformation properties of worlds
with transformations for any algebra. Finally, we combine our generalised equivari-
ance condition and our generalised disentangling definition to show that disentangled
sub-algebras can each have their own individual equivariance conditions, which can be
treated independently.
Keywords: Representation learning, Agents, Disentanglement, Symmetries, Algebra.
1. Introduction
Artificial intelligence (AI) has progressed significantly in recent years due to massive
increases in available computational power allowing the development and training of new
deep learning algorithms [2, 3]. However, the best-performing learning algorithms often
suffer from poor data efficiency and lack the levels of robustness and generalisation that
are characteristic of nature-based intelligence. The brain appears to solve complex tasks
by learning efficient, low-dimensional representations through the simplification of the
tasks by focusing on the aspects of the environment that are important to the performance
of each task [4, 5, 6, 7, 8, 9]. Furthermore, there is evidence that representations in nature
βˆ—Corresponding author.
Email addresses: alexander.dean@city.ac.uk (Alexander Dean), e.alonso@city.ac.uk (Eduardo
Alonso), e.mondragon@city.ac.uk (Esther MondragΒ΄on)
Preprint submitted to Artificial Intelligence October 4, 2023
arXiv:2310.01536v1 [cs.AI] 2 Oct 2023
and in artificial intelligence networks support generalisation and robustness [10, 11, 12,
13, 14, 15, 16].
In its most general form, a representation is an encoding of data. The encoded form
of the data is usually more useful than the original data in some way. For example,
the representation of the data could be easier to transfer than the original data, it
could improve the efficiency or performance of an algorithm or biological system that
uses the representation [17, 18], or the representation could be a recreation of a partially
observable data distribution through the combination of many samples of the distribution
[19, 20].
In machine learning the use of representations has been shown to improve generalisa-
tion capabilities and task performance across a range of areas such as computer vision,
natural language processing, and reinforcement learning [18, 21, 22, 23]. Representation
learning algorithms can be thought of as seeking to remove the need for human-curated,
task-specific feature engineering by performing useful feature engineering automatically.
Representations even emerge naturally in neural networks trained to perform multiple
tasks [24]. Representation learning is the process of devising a method to transform the
input data into its encoded form.
An agent can interact with its world. In this work, we consider an artificial agent
interacting with a world to learn a representation of that world. In particular, we are
interested in the representation that the agent should have ideally learned by the end
of the representation learning process. There are some properties of the world that can
only be learned in the agent’s representation if the agent interacts with its environment
[25, 26]. Agents have an internal state (the agent’s representation), which can be viewed
as the agent’s understanding of the world, and an external state, which is made up of
everything in the world, not in the agent’s internal state. The agent accesses information
about its external state through sensory states (or sensors) and affects the world through
action states (or actions ). This boundary between the external and internal states is
known as a Markov blanket [27, 28]. The world can be treated as a data distribution,
which the agent samples using its sensors and interacts with using its action states. The
agent continually updates its internal state using information gained from its sensors as
it learns more about the world (representation learning). The agent is attempting to
create a representation of this data distribution by applying various operations to the
distribution and observing the outcome of these operations. The operations that the
agent applies to the distribution are the actions of the agent, and each sample that the
agent takes of the distribution using its sensors is an observation of the world. For our
purposes, it is important to note that the agent does not have to be embodied within the
world it is interacting with, it only needs to be able to sample observations of the world
using its sensors and to be able to interact with the world using its actions.
The area of representation learning that deals with representations of worlds that
are influenced by the actions of agents is called state representation learning. State
representation learning is a particular case of representation learning where features are
low dimensional, evolve through time, and are influenced by the actions of agents [17].
More formally, an agent uses sensors to form observations of the states of the world
(the observation process); these observations can then be used to create representations
of the world (the inference process). State representation learning has been combined
with reinforcement learning to enable algorithms to learn more efficiently and with an
improved ability to transfer learnings to other tasks (improved generalisation) [29]. It
2
has been hypothesised that the success of state representation learning is because the
learned abstract representations describe general priors about the world in a way that is
easier to interpret and utilise while not being task specific; this makes the representations
useful for solving a range of tasks [18].
Reinforcement learning is a decision-making algorithm that involves an agent inter-
acting with its environment in order to maximise the total amount of some reward signal
it receives [30, 31, 32, 33]. We choose to explore the structure of worlds containing
features found in common reinforcement learning scenarios because representations of a
world from an agent interacting with that world, as found in our work, are often given
to reinforcement learning algorithms to improve the performance of these reinforcement
learning algorithms.
The question of what makes a β€˜good’ representation is vital to representation learn-
ing. [1] argues that the symmetries of a world are important structures that should
be present in the representation of that world. The study of symmetries moves us away
from studying objects directly to studying the transformations of those objects and using
the information about these transformations of objects to discover properties about the
objects themselves [34]. The field of Physics experienced such a paradigm shift due to
the work of Emmy Noether, who proved that conservation laws correspond to continuous
symmetry transformations [35], and the field of AI is undergoing a similar shift.
The exploitation of symmetries has led to many successful deep-learning architec-
tures. Examples include convolutional layers [36], which utilise transitional symmetries
to outperform humans in image recognition tasks [37], and graph neural networks [38],
which utilise the group of permutations. Not only can symmetries provide a useful indica-
tor of what an agent has learned, but incorporating symmetries into learning algorithms
regularly reduces the size of the problem space, leading to greater learning efficiency
and improved generalisation [34]. In fact, it has been shown that a large majority of
neural network architectures can be described as stacking layers that deal with different
symmetries [39]. The main methods used to integrate symmetries into a representation
are to build symmetries into the architecture of learning algorithms [40, 41], use data-
augmentation that encourages the model to learn symmetries [42, 43], or to adjust the
model’s learning objective to encourage the representation to exhibit certain symme-
tries [44, 45]. The mathematical concept of symmetries can be abstracted to algebraic
structures called groups.
There are two main types of symmetries that are used in AI: invariant symmetries,
where a representation does not change when certain transformations are applied to
it, and equivariant symmetries, where the representation reflects the symmetries of the
world. Historically, the learning of representations that are invariant to certain transfor-
mations has been a successful line of research [46, 47, 48, 49]. In building these invariant
representations, the agent effectively learns to ignore the invariant transformation since
the representation is unaffected by the transformation. It has been suggested that this
approach can lead to more narrow intelligence, where an agent becomes good at solving
a small set of tasks but struggles with data efficiency and generalisation when tasked
with new learning problems [50, 51]. Instead of ignoring certain transformations, the
equivariant approach attempts to preserve symmetry transformations in the agent’s rep-
resentation in such a way that the symmetry transformations of the representation match
the symmetry transformations of the world. It has been hypothesised that the equiv-
ariant approach is likely to produce representations that can be reused to solve a more
3
diverse range of tasks since no transformations are discarded [34]. Equivariant symmetry
approaches are commonly linked with disentangling representations [18], in which the
agent’s representation is separated into subspaces that are invariant to different trans-
formations. Disentangled representation learning, which aims to produce representations
that separate the underlying structure of the world into disjoint parts, has been shown
to improve the data efficiency of learning [52, 53].
Inspired by their use in Physics, symmetry-based disentangled representations (SB-
DRs) were proposed by [1] as a formal mathematical definition of disentangled represen-
tations. SBDRs are built on the assumption that the symmetries of the world state are
important aspects of that world state that need to be preserved in an agent’s internal rep-
resentation (i.e., the symmetries that are present in the world state should also be present
in the agent’s internal representation state). They describe symmetries of the world state
as β€œtransformations that change only some properties of the underlying world state, while
leaving all other properties invariant” [1, page 1]. For example, the 𝑦-coordinate of an
agent moving parallel to the π‘₯-axis on a 2D Euclidean plane does not change. SBDRL has
gained traction in AI in recent years [54, 55, 56, 57, 58, 59, 60, 61, 25, 62, 63]. However,
symmetry-based disentangled representation learning (SBDRL) only considers actions
that form groups and so cannot take into account, for example, irreversible actions [1].
[25] showed that a symmetry-based representation cannot be learned using only a train-
ing set composed of the observations of world states; instead, a training set composed
of transitions between world states, as well as the observations of the world states, is
required. In other words, SBDRL requires the agent to interact with the world. This is
in agreement with the empirically proven idea that active sensing of a world can be used
to make non-invertible mappings from the world state to the representation state into
invertible mappings [64], and gives mathematical credence to SBDRL.
We agree with [1] that symmetry transformations are important structures to include
in an agent’s representation, but want to take their work one step further: we posit that
the relationships of transformations of the world due to the actions of the agent should
be included in the agent’s representation of the world. We will show that only including
transformations of the actions of an agent that form groups in an agent’s representation
of a world would lose important information about the world, since we demonstrate that
features of many worlds cause transformations due to the actions of an agent to not form
group structures. We generalise some important results, which have been put forward
for worlds with transformations of the actions of an agent that form groups, to worlds
with transformations of the actions of an agent that do not form groups. We believe that
including the relationships of these transformations in the agent’s representation has the
potential for powerful learning generalisation properties: (a) Take the following thought
experiment: Consider an agent that has learned the structure of the transformations due
to its actions of a world π‘Š. Now consider a world π‘Šβ€², which is identical to world π‘Šin
every way except observations of the state of the world from a collection of the agent’s
sensors are shifted by a constant amount πœ–(if the sensors were light sensors then π‘Šand
π‘Šβ€²would only be different in colour). So the observations of π‘Šβ€²in state 𝑠′would be given
by π‘œπ‘ β€²=π‘œπ‘ +πœ–, where π‘œπ‘ β€²is the observation of π‘Šβ€²in state 𝑠′, and π‘œπ‘ is the observation of
π‘Šin the state 𝑠. The relationship between the transformations of the world due to the
agent’s actions is the same in both worlds but the observations are different. Therefore,
the agent only needs to learn how to adjust for the shift πœ–in the data from some of its
sensors to be able to learn a good representation of π‘Šβ€². In fact, the agent might have
4
already learned that the transformations of the world due to particular actions cause
relative changes to certain sensor values rather than depending on the raw sensor values.
(b) If the agent completely learns the structure of the world from one state 𝑠then it
knows it from all states of the world. By taking any sequence of actions that go from 𝑠
to some new state and then applying that sequence of actions to the relations that the
agent possesses for state 𝑠, the agent can produce the relationship between actions from
the new state (the relationship between the actions is dependent on the current state,
similar to how local symmetries in Physics are dependent on space-time coordinates).
(c) Another form of generalisation could be due to the action algebra being independent
of the starting state; in other words, the relationship between transformations due to
the agent’s actions from one state is the same as from any other state in the world -
the relationships between actions have been generalised to every state in the world. This
would allow the agent to learn the effect of its actions faster since the relationship between
actions is the same in any state. (d) In a partially observable world, previously explored
relationships between actions could be extrapolated to unexplored areas. If these areas
have the same or similar structure to the relationships between actions then the agent
could generalise what it has learned previously to unexplored parts of the world.
We also believe that a general framework for exploring the algebra of the transfor-
mations of worlds containing an agent, as proposed in this paper, has the potential to
be used as a tool in the field of explainable AI. With such a framework, we are able to
predict which algebraic structures should appear in the agent’s representation at the end
of the learning process. Being able to predict the structures that should be present in an
agent’s representation in certain worlds and using certain learning algorithms would be
a powerful explanatory tool. For example, if there is a sharp improvement in an agent’s
performance at a task at a certain point of learning it could be the case that certain alge-
braic structures are present in the agent’s representation after the sharp improvement in
performance that were not present before the sharp improvement; if so, then it could be
argued that the sharp increase in performance is due to the ’discovery’ of the algebraic
structure in the agent’s representation.
We aim to help answer the question of which features should be present in a β€˜good’
representation by, as suggested by [1], looking at the transformation properties of worlds.
However, while [1] only considered symmetry transformations, we aim to go further and
consider the full algebra of various worlds. We propose a mathematical framework to
describe the transformations of the world and therefore describe the features we expect
to find in the representation of an artificial agent. We derive [1]’s SBDRs using our
framework; we then use category theory to generalise elements of their work, namely
their equivariance condition and their definition of disentangling, to worlds where the
transformations of the world cannot be described using [1]’s framework. This paper aims
to make theoretical contributions to representation learning and does not propose new
learning algorithms. More specifically, our contributions are the following:
1. We propose a general mathematical framework for describing the transformation
of worlds due to the actions of an agent that can interact with the world.
2. We derive the SBDRs proposed by [1] from our framework and in doing so identify
the limitations of SBDRs in their current form.
3. We use our framework to explore the structure of the transformations of worlds
5
for classes of worlds containing features found in common reinforcement learning
scenarios. Our contributions are to the field of representation learning and not
the field of reinforcement learning. We also present the code used to work out the
algebra of the transformations of worlds due to the actions of an agent.
4. We generalise the equivariance condition and the definition of disentangling given
by [1] to worlds that do not satisfy the conditions for SBDRs. This generalisation
is performed using category theory.
This paper is structured as follows: In Section 2 we define our framework and then
describe how it deals with generalised worlds, which consist of distinct world states
connected by transitions that describe the dynamics of the world. We define transitions
and some of their relevant properties. Then we define the actions of an agent in our
framework. In Section 3 our framework is used to reproduce the SBDRL framework
given by [1]. This is achieved by defining an equivalence relation that makes the actions
of an agent equivalent if the actions produce the same outcome if performed while the
world is in any state. In Section 4, we apply our framework to worlds exhibiting common
reinforcement learning scenarios that cannot be described fully using SBDRs and study
the algebraic structures exhibited by the dynamics of these worlds. In Section 5, we
generalise two important results of [1] - the equivariance condition and the disentangling
definition - to worlds with transformations with algebras that do not fit into the SBDRL
paradigm. We finish with a discussion in Section 6.
2. A mathematical framework for an agent in world
We now introduce our general mathematical framework for formally describing the
transformations of a world. We begin by considering a generalised discrete world con-
sisting of a set of discrete states, which are distinguishable in some way, and a set of
transitions between those states; these transitions are the world dynamics (i.e., how the
world can transform from one world state to another world state). This world can be
represented by directed multigraph, where the world states are the vertices of the graph
and the world dynamics are arrows between the vertices. We will use this framework to
reproduce the group action structure of the dynamics of a world in an artificial agent’s
representation as described by [1], and in doing so we uncover the requirements for this
group action structure to be present in a world.
2.1. Simplifications
Fully observable and partially observable worlds. Each observation of the world could
contain all the information about the state of the distribution at the instance the obser-
vation is taken - the world is fully observable - or only some of the information about the
state of the distribution at the instance the observation is taken - the world is partially
observable. Partially observable worlds are common in many AI problems. However,
in this work, we explore the structure of representations of agents in fully observable
worlds. Our focus is on fully observable worlds because (1) their treatment is simpler
than partially observable worlds, making them a good starting point, and (2) the ideal
end result for a world being explored through partial observability should be as close as
possible to the end result for the same world being explored through full observability so
6
in identifying structures that should be present in the representations of fully observable
worlds we are also identifying structures that should ideally be present in the represen-
tations of fully observable worlds without having to consider the complications of partial
observability.
Discrete vs continuous worlds. For simplicity, we only consider worlds made of discrete
states. However, we argue that it is actually more natural to consider discrete worlds.
Consider an agent with 𝑛sensors πœƒπ‘›that can interact with a world π‘Šusing continuous
actions 𝐴to produce a sensor observation: πœƒπ‘–(𝑀)=π‘œπ‘–where π‘–βˆˆ {1, ..., 𝑛}and π‘€βˆˆπ‘Š. We
hypothesise that there will be actions π‘‘π‘Ž ∈𝐴that cause changes in the world state that
are so small that the agent’s sensors will not perceive the change (i.e.,πœƒπ‘–(π‘‘π‘Ž βˆ—π‘€)=π‘œπ‘–).
Therefore, there will be discrete jumps between perceptible states of the world in the
agent’s representation.
2.2. Mathematical model of a world
We consider a world as a set of discrete world states, which are distinguishable in
some way, and a set of world state transitions between those states; these world state
transitions are the transformations of the world. (i.e., how a world state can transform
into another world state). We point out that this description of a world is isomorphic
to a directed multigraph, where the world states are the vertices of the graph and the
world state transitions are arrows between the vertices.
2.2.1. World states and world state transitions
We believe that defining a world as a discrete set of world states with world state
transitions between them is the most general definition of a world, and so take it as our
starting point from which we will build towards defining the algebra of the actions of an
agent. We are going to use these transitions to define the actions of an agent.
Transitions. We consider a directed multigraph 𝒲=(π‘Š, Λ†
𝐷, 𝑠, 𝑑)where π‘Šis a set of
world states,Λ†
𝐷is a set of minimum world state transitions, and 𝑠, 𝑑 :Λ†
π·β†’π‘Š;𝑠is called
the source map and 𝑑is called the target map. For the remainder of the paper, we fix
such a (π‘Š, Λ†
𝐷, 𝑠, 𝑑).𝒲is called a world.
Minimum world state transitions are extended into a set 𝐷of paths called world state
transitions: a path is a sequence of minimum world state transitions 𝑑=Λ†
𝑑𝑛◦ˆ
π‘‘π‘›βˆ’1β—¦...β—¦Λ†
𝑑1
such that 𝑑(Λ†
𝑑𝑖)=𝑠(Λ†
𝑑𝑖+1)for 𝑖=1, ..., 𝑛 βˆ’1. We extend 𝑠, 𝑑 to 𝐷as 𝑠(𝑑)=𝑠(Λ†
𝑑1), and
𝑑(𝑑)=𝑑(Λ†
𝑑𝑛). We also extend the composition operator β—¦to 𝐷such that π‘‘π‘›β—¦π‘‘π‘›βˆ’1β—¦... ◦𝑑1
is defined if 𝑑(𝑑𝑖)=𝑠(𝑑𝑖+1)for 𝑖=1, ..., 𝑛 βˆ’1. For π‘‘βˆˆπ·with 𝑠(𝑑)=𝑀and 𝑑(𝑑)=𝑀′, we
will often denote 𝑑by 𝑑:𝑀→𝑀′.
For the rest of the paper, we assume that, for each world state π‘€βˆˆπ‘Š, there is a
unique trivial world state transition 1π‘€βˆˆΛ†
𝐷with 𝑠(1𝑀)=𝑑(1𝑀); the trivial transition 1𝑀
is associated with the world being in state 𝑀and then no change occurring due to the
transition 1𝑀.
Connected and disconnected worlds. We now introduce connected and disconnected worlds.
Simply, a world 𝐴is connected to a world 𝐡if there is a transition from a world state
in world 𝐴to a world state in world 𝐡. The concepts of connected and disconnected
worlds are necessary for generality; we are only interested in the perspective of the agent
7
and so only care about the world states and transitions that the agent can come into
contact with. Connected and disconnected worlds give us the language to describe and
then disregard the parts of worlds that the agent will never explore and therefore are not
relevant to the agent’s representation. For example, if an agent is in a maze and a section
of the maze is inaccessible from the position that the agent is in, then that section of the
maze would be disconnected from the section of the maze that the agent is in; if we want
to study how the agent’s representation evolves as it learns, it makes sense to disregard
the disconnected section of the maze since the agent never comes into contact with it
and so the disconnected section of the maze will not affect the agent’s representation.
Formally, we first define a sub-world π‘Šβ€²of a world π‘Šas a subset π‘Šβ€²βŠ†π‘Šalong with
𝐷′={π‘‘βˆˆπ·|𝑠(𝑑) ∈ π‘Šβ€²and 𝑑(𝑑) ∈ π‘Šβ€²}. Note that a sub-world is a world. A sub-world
π‘Šis connected to a sub-world π‘Šβ€²if there exists a transition 𝑑:𝑀→𝑀′where π‘€βˆˆπ‘Š
and π‘€β€²βˆˆπ‘Šβ€²; if no such transition exists, then π‘Šis disconnected from π‘Šβ€². Similarly, a
world state 𝑀is connected to a sub-world π‘Šif there exists a transition 𝑑:𝑀→𝑀′where
π‘€β€²βˆˆπ‘Šβ€²; if no such transition exists, then 𝑀is disconnected from π‘Šβ€².
Effect of transitions on world states. We define βˆ—as a partial function π·Γ—π‘Šβ†’π‘Šby
π‘‘βˆ—π‘€=𝑀′where 𝑑:𝑀→𝑀′and undefined otherwise.
2.2.2. Example
We consider a cyclical 2 Γ—2 grid world, denoted by 𝒲
𝑐, containing an agent as shown
in Figure 1. The transformations of 𝒲
𝑐are due to an agent moving either up (π‘ˆ), down
(𝐷), left (𝐿), right (𝑅), or doing nothing (1). The possible world states of 𝒲
𝑐are shown
in Figure 1. 𝒲
𝑐, and variations of it, is used as a running example to illustrate the
concepts presented in this paper.
We say the world being cyclical means that if the agent performs the same action
enough times, then the agent will return to its starting position; for example, for the
world 𝒲
𝑐if the agent performs the action π‘ˆtwice when the world is in state 𝑀0in
Figure 1 then the world will transition into the state 𝑀0(i.e., π‘ˆ2βˆ—π‘€0=𝑀0). The
transition due to performing each action in each state can be found in Table 1.
1π‘ˆ 𝐷 𝐿 𝑅
𝑀0𝑀0𝑀2𝑀2𝑀1𝑀1
𝑀1𝑀1𝑀3𝑀3𝑀0𝑀0
𝑀2𝑀2𝑀0𝑀0𝑀3𝑀3
𝑀3𝑀3𝑀1𝑀1𝑀2𝑀2
Table 1: Each entry in this table shows the outcome state of the agent performing the action given in
the column label when in the world state given by the row label.
The transitions shown in Table 1 can be represented as the transition diagram given in
Figure 2. It should be noted that, since the structure of the diagram is wholly dependent
on the arrows between the world states, the positioning of the world states is an arbitrary
choice.
2.3. Agents
We now define the actions of agents as collections of transitions that are labelled by
their associated action. We then define some relevant properties of these actions.
8
(a) 𝑀0(b) 𝑀1
(c) 𝑀2(d) 𝑀3
Figure 1: The world states of a cyclical 2 Γ—2 grid world π‘Šπ‘, where changes to the world are due to an
agent moving either up, down, left, or right. The position of the agent in the world is represented by
the position of the circled A.
Figure 2: A transition diagram for the transitions shown in Table 1.
9
2.3.1. Treatment of agents
We now discuss how agents are treated in our model. We consider worlds containing
an embodied agent, which is able to interact with the environment by performing actions.
The end goal of the agent’s learning process is to map the useful aspects of the structure
of the world to the structure of its representation; the useful aspects are those that enable
the agent to complete whatever task it has.
We use the treatment of agents adopted by [1]. The agent has an unspecified number
of unspecified sensory states that allow it to make observations of the state of the envi-
ronment. Information about the world state that the agent is currently in is delivered to
the internal state of the agent through the sensory states. Mathematically, the process
of information propagating through the sensory states is a mapping 𝑏:π‘Šβ†’π‘‚(the
β€˜observation process’), which produces a set of observations π‘œ1containing a single obser-
vation for each sensory state. These observations are then used by an inference process
β„Ž:𝑂→𝑍to produce an internal representation. The agent then uses some internal
mechanism to select an action to perform; this action is performed through active states.
The agent can be thought of as having a (non-physical) boundary between its internal
state, containing its internal state representation(s) of the world, and the external world
state. Information about the world state is accessed by the internal state through sensory
states only (the observation process). The agent affects the world using active states.
It is important to note that the agent’s state representation only reflects the ob-
servations the agent makes with its sensors; in other words, the agent’s internal state
is built using the information about aspects of the world state propagated through its
sensory states, in the form that the observation process provides the information, and
not directly from the world state. For example, the human eye (the sensor) converts
information about the light entering the eye into electrical and chemical energy in the
optic nerve (the observation process) and then the information from the optic nerve is
given to the brain for inference. Information could be lost or modified during the ob-
servation process. For example, light sensors could only pick up certain wavelengths of
light or take in a colour spectrum and output in greyscale. This means that world states
with differences that are not detectable by the agent’s sensors are effectively identical
from the agent’s perspective and so can be treated as such when we are constructing a
mathematical description of the agent’s interaction with the world.
2.3.2. Actions of an agent as labelled transitions
Consider a set Λ†
𝐴called the set of minimum actions. Let the set 𝐴be the set of
all finite sequences formed from the elements of the set Λ†
𝐴; we call 𝐴the set of actions.
Consider a set Λ†
π·π΄βŠ‚π·, where 1π‘€βˆˆΛ†
𝐷𝐴for all π‘€βˆˆπ‘Š; we call Λ†
𝐷𝐴the set of minimum
action transitions. We consider a labelling map Λ†
𝑙:Λ†
𝐷𝐴→ˆ
𝐴such that:
1. Any two distinct transitions leaving the same world state are labelled with different
actions.
Action condition 1. For any 𝑑, π‘‘β€²βˆˆΛ†
𝐷𝐴with 𝑠(𝑑)=𝑠(𝑑′),Λ†
𝑙(𝑑)β‰ Λ†
𝑙(𝑑′).
2. There is an identity action that leaves any world state unchanged.
Action condition 2. There exists an action 1βˆˆΛ†
𝐴such that Λ†
𝑙(1𝑀)=1for all
π‘€βˆˆπ‘Š. We call 1the identity action.
10
Figure 3: Labelling the transitions in Figure 2 with the relevant actions in 𝐴.
Given Λ†
𝐷𝐴as defined above and satisfying action condition 1 and action condition 2,
we define 𝑄𝐷𝐴=(π‘Š, Λ†
𝐷𝐴, 𝑠𝐴, 𝑑𝐴), where 𝑠𝐴, 𝑑𝐴are the restrictions of 𝑠, 𝑑 to the set Λ†
𝐷𝐴.
We now define a set 𝐷𝐴, the set of action transitions, which is the set of all paths of
𝑄𝐷𝐴.
We extend the map Λ†
𝑙to a map 𝑙:𝐷𝐴→𝐴such that if 𝑑=Λ†
𝑑𝑛◦... β—¦Λ†
𝑑1then
𝑙(𝑑)=Λ†
𝑙(Λ†
𝑑𝑛)...Λ†
𝑙(Λ†
𝑑1). For π‘‘βˆˆπ·π΄with 𝑠(𝑑)=𝑀,𝑑(𝑑)=𝑀′and 𝑙(𝑑)=π‘Ž, we will often
denote 𝑑by 𝑑:π‘€π‘Ž
βˆ’β†’ 𝑀′.
If an action π‘Žβˆˆπ΄is expressed in terms of its minimum actions as π‘Ž=Λ†
π‘Žπ‘›β—¦... β—¦Λ†
π‘Ž1,
then π‘Ž=𝑙(𝑑)=𝑙(Λ†
𝑑𝑛◦... β—¦Λ†
𝑑1)=Λ†
𝑙(Λ†
𝑑𝑛) β—¦ ... β—¦Λ†
𝑙(Λ†
𝑑1)=Λ†
π‘Žπ‘›β—¦... β—¦Λ†
π‘Ž1, where the Λ†
π‘Žπ‘–are called
minimum actions.
Remark 2.3.1. For a given π‘€βˆˆπ‘Š, we can label transitions in 𝐷𝐴with an appropriate
element of 𝐴through the following: for each π‘‘βˆˆπ·π΄with 𝑠(𝑑)=𝑀, express 𝑑in terms
of its minimum transitions in 𝐷𝐴as 𝑑=𝑑𝑛◦... ◦𝑑2◦𝑑1; if Λ†
𝑙(𝑑𝑖)=π‘Žπ‘–then 𝑑is labelled
with π‘Žπ‘›...π‘Ž2π‘Ž1∈𝐴. We denote the map that performs this labelling by 𝑙:𝐷𝐴→𝐴.
Figure 3 shows how transitions are labelled with actions in our 2 Γ—2 cyclical world
example. We only show the minimum actions for simplicity but there are actually infinite
action transitions between each pair of world states; for example, the action transitions
from 𝑀0to 𝑀1include those labelled by: 𝐷◦𝑅,𝐷◦𝑅◦1𝑛(π‘›βˆˆN), 1𝑛◦𝐷◦𝑅(π‘›βˆˆN),
𝐷◦𝑅◦ (𝐿◦𝑅)𝑛(π‘›βˆˆN) etc...
Effect of actions on world states. We define the effect of the action π‘Žβˆˆπ΄on world state
π‘€βˆˆπ‘Šas the following: if there exists π‘‘βˆˆπ·π΄such that 𝑠(𝑑)=𝑀and 𝑙(𝑑)=π‘Ž, then
π‘Žβˆ—π‘€=𝑑(𝑑); if there does not exist π‘‘βˆˆπ·π΄such that 𝑠(𝑑)=𝑀and 𝑙(𝑑)=π‘Ž, then we say
that π‘Žβˆ—π‘€is undefined. The effect of actions on world states is well-defined due to action
condition 1. We can apply the minimum actions that make up an action to world states
individually: if π‘Žβˆ—π‘€is defined and π‘Ž=Λ†
π‘Žπ‘˜...Λ†
π‘Ž1then π‘Žβˆ—π‘€=(Λ†
π‘Žπ‘˜...Λ†
π‘Ž1) βˆ— 𝑀=Λ†
π‘Žπ‘˜...Λ†
π‘Ž2βˆ— ( Λ†
π‘Ž1βˆ—π‘€).
11
Physically, the identity action 1 ∈𝐴corresponds to the no-op action (i.e., the world state
does not change due to this action).
Actions as (partial) functions. Consider all the transitions that are labelled by a partic-
ular action π‘Žβˆˆπ΄. Together these transitions form a partial function π‘“π‘Ž:π‘Šβ†’π‘Šbecause
for any π‘€βˆˆπ‘Šeither π‘Žβˆ—π‘€is undefined or π‘Žβˆ—π‘€is defined and there is a unique world
state π‘€β€²βˆˆπ‘Šfor which π‘Žβˆ—π‘€=𝑀′(due to condition 1). π‘“π‘Žis not generally surjective
because for a given π‘€βˆˆπ‘Šthere is not necessarily a transition π‘‘βˆˆπ·with 𝑙(𝑑)=π‘Ž
and 𝑑(𝑑)=𝑀.π‘“π‘Žis not generally injective because it is possible to have an environment
where π‘“π‘Ž(𝑀)=π‘“π‘Ž(𝑀′)for some π‘€βˆˆπ‘Šdifferent from π‘€β€²βˆˆπ‘Š. We can also reproduce
these functions using the formalism given by [25], which describes the dynamics of the
world in terms of a multivariate function 𝑓:π΄Γ—π‘Šβ†’π‘Š. If we let 𝑓:π΄Γ—π‘Šβ†’π‘Š
be the dynamics of the environment then the transition caused by an action π‘Žβˆˆπ΄on
a world state π‘€βˆˆπ‘Š(where π‘Žβˆ—π‘€is defined) is given by (π‘Ž, 𝑀) ↦→ 𝑓(π‘Ž, 𝑀)=π‘Žβˆ—π‘€.
Mathematically, we curry the function 𝑓:π΄Γ—π‘Šβ†’π‘Što give a collection {π‘“π‘Ž}of par-
tial functions with a partial function Λ†
𝑓(π‘Ž)=π‘“π‘Ž:π‘Šβ†’π‘Šfor each action π‘Žβˆˆπ΄as
Curry :(𝑓:π΄Γ—π‘Šβ†’π‘Š) β†’ ( π‘“π‘Ž:π‘Šβ†’π‘Š).
3. Reproducing SBDRL
We will now use the framework set out in the previous section to reproduce the SB-
DRs of [1]. We illustrate our ideas using the worlds that are similar to those given by
[1] and [25]. We choose to begin by reproducing symmetry-based representations using
our framework because (1) symmetry-based representations describe transformations of
the world that have formed relatively simple and well-understood algebraic structures
(groups), (2) groups, and the symmetries they describe, are gaining increasing promi-
nence in artificial intelligence research, (3) it shows how our framework encompasses
previous work in formalising the structure of transformations of a world, and (4) it
provides a more rigorous description of SBDRL, which should aid future analysis and
development of the concept.
Section 3.1 provides a description of SBDRL, and Section 3.2 shows how to get to
SBDRL using an equivalence relation on the actions of the agent. Section 3.3 provides
information on the algorithmic exploration of world structures performed on example
worlds. Section 3.3.1 goes through a worked example. Finally, Section 3.4 shows the
conditions of the world that are required for the actions of an agent to be fully described
by SBDRs.
3.1. Symmetry-based disentangled representation learning
We will now present a more detailed description of the SBDRL formalism.
From world states to representation states. The world state is an element of a set π‘Š
of all possible world states. The observations of a particular world state made by the
agent’s sensors are elements of the set 𝑂of all possible observations. The agent’s internal
state representation of the world state is an element of a set 𝑍of all possible internal
state representations. There exists a composite mapping 𝑓=β„Žβ—¦π‘:π‘Šβ†’π‘that maps
world states to states of the agent’s representation (𝑀↦→ 𝑧); this composite mapping is
made up of the mapping of an observation process 𝑏:π‘Šβ†’π‘‚that maps world states
12
Figure 4: The composite mapping from the set π‘Šof world states to the set 𝑍of state representations
via the set 𝑂of observations.
to observations (𝑀↦→ π‘œ) and the mapping of an inference process β„Ž:𝑂→𝑍that maps
observations to the agent’s internal state representation (π‘œβ†¦β†’ 𝑧) (see Figure 4).
Groups and symmetries.
Definition 3.1 (Group).A group 𝐺is a set with a binary operation 𝐺×𝐺→𝐺,
(𝑔, 𝑔′) ↦→ 𝑔◦𝑔′called the composition of group elements that satisfies the following
properties:
1. Closure. 𝑔◦𝑔′is defined for all 𝑔, π‘”β€²βˆˆπΊ.
2. Associative. (𝑔◦𝑔′) β—¦ 𝑔′′ =𝑔◦ (𝑔′◦𝑔′′ )for all 𝑔, 𝑔′, 𝑔′′ ∈𝐺.
3. Identity. There exists a unique identity element 1∈𝐺such that 1◦𝑔=𝑔◦1=𝑔
for all π‘”βˆˆπΊ.
4. Inverse. For any π‘”βˆˆπΊ, there exists π‘”βˆ’1∈𝐺such that π‘”β—¦π‘”βˆ’1=π‘”βˆ’1◦𝑔=1.
Applying symmetries to objects is mathematically defined as a group action.
Definition 3.2 (Group action).Given a group 𝐺and a set 𝑋, a group action of 𝐺on
𝑋is a map 𝐺×𝑋→𝑋,(𝑔, π‘₯) ↦→ π‘”βˆ—π‘₯that satisfies the following properties:
1. Compatibility with composition. The composition of group elements and the group
action are compatible: 𝑔′◦ ( π‘”βˆ—π‘₯)=(𝑔′◦𝑔) βˆ— π‘₯for 𝑔, π‘”β€²βˆˆπΊand π‘₯βˆˆπ‘‹.
2. Identity. The group identity 1∈𝐺leaves the elements of 𝑋unchanged: 1βˆ—π‘₯=π‘₯
for all π‘₯βˆˆπ‘‹.
Another important property of groups is commutation. Two elements of a group
commute if the order they are composed does not matter: 𝑔◦𝑔′=𝑔′◦𝑔. If all elements
in a group commute with each other then the group is called commutative. Subgroups
of a group might commute with each other.
13
Symmetry-based representations. The set π‘Šof world states has a set of symmetries that
are described by the group 𝐺. This group 𝐺acts on the set π‘Šof world states via a
group action Β·π‘Š:πΊΓ—π‘Šβ†’π‘Š. For the agent’s representations π‘§π‘–βˆˆπ‘to be symmetry-
based representations, a corresponding group action ·𝑍:𝐺×𝑍→𝑍must be found so
that the symmetries of the agent’s representations reflect the symmetries of the world
states. The mathematical condition for this is that, for all π‘€βˆˆπ‘Šand all π‘”βˆˆπΊ, applying
the action π‘”Β·π‘Što 𝑀and then applying the mapping 𝑓gives the same result as first
applying the mapping 𝑓to 𝑀to give 𝑓(𝑀)and then applying the action 𝑔·𝑍to 𝑓(𝑀).
Mathematically, this is 𝑓(π‘”Β·π‘Šπ‘€)=𝑔·𝑍𝑓(𝑀). If this condition is satisfied, then 𝑓is
called a group-equivariant map.
Symmetry-based disentangled representations. To go from symmetry-based representa-
tions to symmetry-based disentangled representations, suppose the group of symmetries
𝐺of the set π‘Šof world states decomposes as a direct product 𝐺=𝐺1Γ—. . . ×𝐺𝑖×. . . ×𝐺𝑛.
The group action ·𝑍:𝐺×𝑍→𝑍and the set 𝑍are disentangled with respect to the
decomposition of 𝐺, if there is a decomposition 𝑍=𝑍1Γ—. . . ×𝑍𝑖×. . . ×𝑍𝑛and actions
·𝑍𝑖:𝐺𝑖×𝑍𝑖→𝑍𝑖, 𝑖 ∈ {1, . . . , 𝑛 }such that (𝑔𝐺1, 𝑔𝐺2) ·𝑍(𝑧𝑍1, 𝑧 𝑍2)=(𝑔𝐺1·𝑍1𝑧𝑍1, 𝑔𝐺2·𝑍2𝑧𝑍2),
where π‘”πΊπ‘–βˆˆπΊπ‘–and π‘§π‘π‘–βˆˆπ‘π‘–. In other words, each subspace 𝑍𝑖is invariant to the action
of all the 𝐺𝑗≠𝑖and only affected by 𝐺𝑖.
Summary. The representations in 𝑍are symmetry-based disentangled with respect to
the decomposition 𝐺=𝐺1Γ—. . . ×𝐺𝑖×. . . ×𝐺𝑛, where each 𝐺𝑖acts on a disjoint part of
𝑍, if:
1. There exists a group action Β·π‘Š:πΊΓ—π‘Šβ†’π‘Šand a corresponding group action
·𝑍:𝐺×𝑍→𝑍;
2. The map 𝑓:π‘Šβ†’π‘is group-equivariant between the group actions on π‘Šand 𝑍:
𝑔·𝑍𝑓(𝑀)=𝑓(π‘”Β·π‘Šπ‘€). In other words, the diagram
𝑀 𝑔 Β·π‘Šπ‘€
𝑓(𝑀)𝑔·𝑍𝑓(𝑀)=𝑓(π‘”Β·π‘Šπ‘€)
π‘”Β·π‘Š
𝑓𝑓
𝑔·𝑍
commutes.
3. There exists a decomposition of the representation 𝑍=𝑍1Γ—. . . ×𝑍𝑛such that each
subspace 𝑍𝑖is unaffected by the action for all 𝐺𝑗≠𝑖and is only affected by 𝐺𝑖.
Limitations of SBDRL. Both [1] and [25] suggest that these group actions can be used
to describe some types of real-world actions. However, it is important to note that they
do not believe that all actions can be described by their formalism: β€œIt is important to
mention that not all actions are symmetries, for instance, the action of eating a collectible
item in the environment is not part of any group of symmetries of the environment because
it might be irreversible.” [25, page 4].
14
3.2. SBDRL through equivalence
For the algebra of the actions of our agent to form a group, we need some sense of
actions being the same so that the algebra can satisfy the group properties (e.g., for the
identity property we need an element 1 in the algebra 𝐴such that 1π‘Ž=π‘Ž1=π‘Žfor any
π‘Žβˆˆπ΄). We define an equivalence relation on the elements of 𝐴that says two actions
are equivalent (our sense of the actions being the same) if they lead to the same end
world state when performed in any initial world state. This equivalence relation is based
on our mathematical interpretation of the implication given by [1] that transformations
of the world are the same if they have the same effect, which is used to achieve the
group structure for SBDRL. Our use of an equivalence relation was inspired by [65],
which uses a similar equivalence relation to equate action sequences that cause the same
final observation state after each action sequence is performed from an initial observation
state. We then derive some properties of the equivalence classes created by ∼that will be
used to show that the actions of an agent form the group action described by [1] under
the equivalence relations we define and for worlds satisfying certain conditions.
Definition 3.3 (Equivalence of actions under ∼).Given two actions π‘Ž, π‘Žβ€²βˆˆπ΄, we denote
π‘ŽβˆΌπ‘Žβ€²if π‘Žβˆ—π‘€=π‘Žβ€²βˆ—π‘€for all π‘€βˆˆπ‘Š.
Remark 3.2.1. If π‘ŽβˆΌπ‘Žβ€², then either for each π‘€βˆˆπ‘Š(1) there exists transitions
𝑑:π‘€π‘Ž
βˆ’β†’ 𝑑(𝑑)and 𝑑′:π‘€π‘Ž
βˆ’β†’ 𝑑(𝑑)or (2) there exists no transitions 𝑑:π‘€π‘Ž
βˆ’β†’ 𝑑(𝑑)or
𝑑′:π‘€π‘Ž
βˆ’β†’ 𝑑(𝑑).
Proposition 3.1. ∼is an equivalence relation.
Proof. Reflexive. If π‘ŽβˆΌπ‘Žβ€²then π‘Žβˆ—π‘€=π‘Žβˆ—π‘€β€²for all π‘€βˆˆπ‘Š, and so π‘ŽβˆΌπ‘Ž.
Transitive. If π‘ŽβˆΌπ‘Žβ€²and π‘Žβ€²βˆΌπ‘Žβ€²β€², then π‘Žβˆ—π‘€=π‘Žβ€²βˆ—π‘€for all π‘€βˆˆπ‘Šand π‘Žβ€²βˆ—π‘€=π‘Žβ€²β€² βˆ—π‘€
for all π‘€βˆˆπ‘Š. Therefore, π‘Žβˆ—π‘€βˆΌπ‘Žβ€²β€² βˆ—π‘€for all π‘€βˆˆπ‘Šand so π‘ŽβˆΌπ‘Žβ€²β€² .
Symmetric. If π‘ŽβˆΌπ‘Žβ€², then π‘Žβˆ—π‘€=π‘Žβ€²βˆ—π‘€for all π‘€βˆˆπ‘Š. Therefore π‘Žβ€²βˆ—π‘€=π‘Žβˆ—π‘€for
all π‘€βˆˆπ‘Š, and so π‘Žβ€²βˆΌπ‘Ž.β–‘
Figure 5 shows the effect of applying the equivalence relations to our 2 Γ—2 cyclical
example world 𝒲
𝑐.
We define the canonical projection map πœ‹π΄:𝐴→𝐴/∼ that sends actions in 𝐴to
their equivalence classes under ∼in the set 𝐴/∼. We denote the equivalence class of π‘Ž
by [π‘Ž]∼. Sometimes we will drop the [π‘Ž]∼in favour of π‘Žβˆˆπ΄/∼ for ease.
Composition of actions. We define the composition of elements in 𝐴/∼ as β—¦:(𝐴/∼) Γ—( 𝐴/∼
)β†’(𝐴/∼) such that [π‘Žβ€²]βˆΌβ—¦ [π‘Ž]∼=[π‘Žβ€²β—¦π‘Ž]∼for π‘Ž, π‘Žβ€²βˆˆπ΄.
Proposition 3.2. [π‘Žβ€²]βˆΌβ—¦ [π‘Ž]∼=[π‘Žβ€²β—¦π‘Ž]∼is well-defined for all π‘Ž, π‘Žβ€²βˆˆπ΄.
Proof. We need to show that the choice of π‘Ž, π‘Žβ€²doesn’t matter: if π‘Ž1βˆΌπ‘Ž2and π‘Ž3βˆΌπ‘Ž4
for π‘Ž1, π‘Ž2, π‘Ž3, π‘Ž4∈𝐴, then [π‘Ž3β—¦π‘Ž1]∼=[π‘Ž4β—¦π‘Ž2]∼.π‘Ž1βˆΌπ‘Ž2means there exists 𝑑𝑖:
𝑠(𝑑1)π‘Žπ‘–
βˆ’β†’ 𝑑(𝑑1)for 𝑖=1,2. Since actions are unrestricted in π‘Š, for any world state and
any action there is a transition with a source at that world state that is labelled by that
action. π‘Ž3βˆΌπ‘Ž4means there exists 𝑑𝑗:𝑠(𝑑3)π‘Žπ‘—
βˆ’βˆ’β†’ 𝑑(𝑑3)for 𝑗=3,4, and so there exists
𝑑𝑗:𝑑(𝑑1)π‘Žπ‘—
βˆ’βˆ’β†’ 𝑑(𝑑3)for 𝑗=3,4. =β‡’there exists (𝑑3◦𝑑1):𝑠(𝑑1)π‘Ž3β—¦π‘Ž1
βˆ’βˆ’βˆ’βˆ’βˆ’β†’ 𝑑(𝑑3)and
(𝑑4◦𝑑2):𝑠(𝑑1)π‘Ž4β—¦π‘Ž2
βˆ’βˆ’βˆ’βˆ’βˆ’β†’ 𝑑(𝑑3).=⇒𝑠(𝑑3◦𝑑1)=𝑠(𝑑4◦𝑑2)and 𝑑(𝑑3◦𝑑1)=𝑠(𝑑4◦𝑑2).=β‡’
(π‘Ž3β—¦π‘Ž1)∼(π‘Ž4β—¦π‘Ž2).=β‡’ [π‘Ž3β—¦π‘Ž1)]∼=[π‘Ž4β—¦π‘Ž2]∼.β–‘
15
Figure 5: Action equivalence classes in 𝐴/∼ for the actions show in Figure 3.
Effect of equivalent actions on world states. We define the effect of an element of 𝐴/∼
on world states as βˆ—:(𝐴/∼) Γ— π‘Šβ†’π‘Šsuch that [π‘Ž]βˆΌβˆ—π‘€=π‘Žβˆ—π‘€. Note that this is only
defined if there exists 𝑑:π‘€π‘Ž
βˆ’β†’ 𝑑(𝑑)for π‘‘βˆˆπ·π΄.; if not, then [π‘Ž]βˆΌβˆ—π‘€is called undefined.
Proposition 3.3. [π‘Ž]βˆΌβˆ—π‘€is well-defined for all π‘Žβˆˆπ΄and for all π‘€βˆˆπ‘Š.
Proof. We need to show that π‘Ž1βˆ—π‘€=π‘Ž2βˆ—π‘€if [π‘Ž1]∼=[π‘Ž2]∼for π‘Ž1, π‘Ž2∈𝐴and π‘€βˆˆπ‘Š.
If [π‘Ž1]∼=[π‘Ž2]∼, then π‘Ž1βˆΌπ‘Ž2. Since actions are unrestricted in π‘Š, for any world state
and any action there is a transition with a source at that world state is are labelled by
that action. =β‡’there exists 𝑑𝑖:π‘€π‘Žπ‘–
βˆ’β†’ 𝑑(𝑑1)for 𝑖=1,2. =β‡’π‘Ž1βˆ—π‘€=𝑑1βˆ—π‘€=𝑑(𝑑1)
and π‘Ž2βˆ—π‘€=𝑑2βˆ—π‘€=𝑑(𝑑1).β–‘
Reversible actions. An action π‘Žβˆˆπ΄is called reversible in a given state π‘€βˆˆπ‘Šif π‘Žβˆˆπ΄π‘…π‘€
where 𝐴𝑅𝑀 ={π‘Žβˆˆπ΄|there exists π‘Žβ€²βˆˆπ΄such that π‘Žβ€²β—¦π‘Žβˆ—π‘€=𝑀}. An action π‘Žβˆˆπ΄
is called reversible if it is reversible in every π‘€βˆˆπ‘Š. An action that is not reversible is
called irreversible.
Properties of the quotient set 𝐴/∼.
Proposition 3.4. (𝐴/∼,β—¦) has an identity element.
Proof. To show that (𝐴/∼,β—¦) has an identity element we can show that there is an element
π‘’βˆˆπ΄which satisfies (a) [π‘Ž]βˆΌβ—¦ [𝑒]∼=[π‘Ž]∼and (b) [𝑒]βˆΌβ—¦ [π‘Ž]∼=[π‘Ž]∼for all π‘Žβˆˆπ΄. We
will prove that the identity action 1 ∈𝐴satisfies the above condition. Consider any
transition 𝑑:𝑠(𝑑)π‘Ž
βˆ’β†’ 𝑑(𝑑)labelled by any action π‘Žβˆˆπ΄.
(a) There exists a transition 1𝑠(𝑑):𝑠(𝑑)1
βˆ’β†’ 𝑠(𝑑)due to action condition 2. 𝑑(1𝑠(𝑑))=
𝑠(𝑑)=β‡’π‘Žβ—¦1 is defined for 𝑑.𝑠(π‘Žβ—¦1)=𝑠(1)=𝑠(𝑑)=𝑠(π‘Ž)and 𝑑(π‘Žβ—¦1)=𝑑(π‘Ž).=β‡’
π‘Žβ—¦1βˆΌπ‘Ž.=β‡’ [ π‘Žβ—¦1]∼=[π‘Ž]∼.=β‡’ [π‘Ž]βˆΌβ—¦ [1]∼=[π‘Ž]∼.
(b) There exists a transition 1𝑑(𝑑):𝑑(𝑑)1
βˆ’β†’ 𝑑(𝑑)due to action condition 2. 𝑑(1𝑑(𝑑))=
𝑑(𝑑)=β‡’1β—¦π‘Žis defined for 𝑑.𝑠(1β—¦π‘Ž)=𝑠(π‘Ž)and 𝑑(1β—¦π‘Ž)=𝑑(1)=𝑑(π‘Ž).=β‡’1β—¦π‘ŽβˆΌπ‘Ž.
=β‡’ [1β—¦π‘Ž]∼=[π‘Ž]∼.=β‡’ [1]βˆΌβ—¦ [π‘Ž]∼=[π‘Ž]∼. Therefore 1 ∈𝐴satisfies the conditions for
[1]∼being an identity element in (𝐴/∼,β—¦).β–‘
16
Proposition 3.5. β—¦is associative with respect to (𝐴/∼,β—¦).
Proof. For β—¦to be associative we need [π‘Ž1]βˆΌβ—¦ ( [π‘Ž2]βˆΌβ—¦ [π‘Ž3]∼)=([π‘Ž1]βˆΌβ—¦ [π‘Ž2]∼) β—¦ [π‘Ž3]∼for
any π‘Ž1, π‘Ž2, π‘Ž3∈𝐴. We have π‘Ž1β—¦ (π‘Ž2β—¦π‘Ž3)=(π‘Ž1β—¦π‘Ž2) β—¦ π‘Ž3from the associativity of β—¦with
respect to (𝐴, β—¦), and [π‘Žβ€²]βˆΌβ—¦ [π‘Ž]∼=[π‘Žβ€²β—¦π‘Ž]∼for any π‘Ž, π‘Žβ€²βˆˆπ΄by definition of β—¦on 𝐴/∼.
=β‡’ [ π‘Ž1]βˆΌβ—¦ ( [π‘Ž2]βˆΌβ—¦ [π‘Ž3]∼)=[π‘Ž1β—¦ (π‘Ž2β—¦π‘Ž3)]∼=[ (π‘Ž1β—¦π‘Ž2) β—¦ π‘Ž3]∼=[ (π‘Ž1β—¦π‘Ž2) ]βˆΌβ—¦ [π‘Ž3]∼=
([π‘Ž1]βˆΌβ—¦ [π‘Ž2]∼) β—¦ [π‘Ž3]∼.β–‘
In summary, we have (𝐴, β—¦,βˆ—), which is a set 𝐴along with two operators β—¦:𝐴×𝐴→𝐴
and βˆ—:π΄Γ—π‘Šβ†’π‘Š, and we have (𝐴/∼,β—¦,βˆ—), which is a set 𝐴/∼ along with two operators
β—¦:(𝐴/∼) Γ— ( 𝐴/∼) β†’ ( 𝐴/∼) and βˆ—:(𝐴/∼) Γ— π‘Šβ†’π‘Š. We have shown that β—¦is associative
with respect to (𝐴/∼,β—¦), and that (𝐴/∼,β—¦) has an identity element by action condition
2.
3.3. Algorithmic exploration of world structures
To gain an intuition for the structure of different worlds and to illustrate our theo-
retical work with examples, we developed an algorithm that uses an agent’s minimum
actions to generate the algebraic structure of the transformations of a world due to the
agent’s actions. We display this structure as a generalised Cayley table (a multiplication
table for the distinct elements of the algebra). Implementation of this algorithm can be
found at github.com/awjdean/CayleyTableGeneration.
Cayley table generating algorithm. First, we generate what we call a state Cayley table
(Algorithm 1). The elements of this state Cayley table are the world state reached when
the row action and then the column action are performed in succession from an initial
world state 𝑀(i.e.,π‘π‘œπ‘™ π‘’π‘šπ‘› 𝑙 π‘Žπ‘π‘’π‘™ βˆ— (π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ βˆ—π‘€)). Once the state Cayley table had
been generated we can use it to generate the action Cayley table, in which the elements
of the table are the equivalent elements in the algebra if the agent performs the row
action followed by the column action (Algorithm 2).
17
Algorithm 1 Generate state Cayley table.
Require: π‘šπ‘–π‘›π‘–π‘šπ‘’π‘š π‘Žπ‘π‘‘π‘–π‘œπ‘›π‘ : a list of minimum actions, 𝑀: initial world state.
1: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ ←an empty square matrix with dimensions
𝑙𝑒𝑛(π‘šπ‘–π‘›π‘–π‘šπ‘’π‘š π‘Žπ‘π‘‘π‘– π‘œπ‘›π‘ ) Γ— 𝑙 𝑒𝑛(π‘šπ‘– π‘›π‘–π‘šπ‘’π‘š π‘Žπ‘π‘‘π‘–π‘œπ‘›π‘ ), with rows and columns labelled by
π‘šπ‘–π‘›π‘–π‘šπ‘’π‘š π‘Žπ‘π‘‘π‘–π‘œπ‘›π‘ .
2: for π‘Žin π‘šπ‘–π‘›π‘–π‘šπ‘’π‘š π‘Žπ‘π‘‘π‘–π‘œπ‘›π‘  do
3: Create an equivalence class for π‘Ž.
4: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ ←AddElementToStateCayleyTable(π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒𝑦 π‘‘π‘Žπ‘π‘™π‘’,𝑀,
π‘Ž). ⊲See Algorithm 3.
5: end for
6: for π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ in π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 𝑑 π‘Žπ‘ 𝑙𝑒 do
7: π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™ 𝑒𝑛𝑑𝑠 𝑓 π‘œπ‘’π‘›π‘‘ ←SearchForEquivalents(π‘ π‘‘π‘Žπ‘‘π‘’ 𝑐 π‘Žπ‘¦π‘™π‘’π‘¦ π‘‘π‘Ž 𝑏𝑙 𝑒,𝑀,π‘Ÿ π‘œπ‘€ π‘™π‘Ž 𝑏𝑒𝑙).
⊲See Algorithm 5.
8: if 𝑙𝑒𝑛(π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™π‘’π‘›π‘‘ 𝑠 𝑓 π‘œπ‘’π‘›π‘‘)β‰ 0then
9: Merge the equivalence classes of equivalent minimum actions.
10: Delete the row and column from π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Žπ‘¦π‘™π‘’ 𝑦 𝑑 π‘Žπ‘π‘™ 𝑒 for the minimum actions
not labelling the merged equivalence class.
11: end if
12: end for
13: Initialize an empty list π‘π‘Žπ‘›π‘‘π‘–π‘‘π‘Žπ‘‘ 𝑒 π‘π‘Ž 𝑦𝑙𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ 𝑒𝑙 π‘’π‘šπ‘’π‘›π‘‘ 𝑠.
14: π‘π‘Žπ‘›π‘‘π‘– π‘‘π‘Žπ‘‘ 𝑒 π‘π‘Žπ‘¦π‘™π‘’π‘¦ 𝑑 π‘Žπ‘π‘™π‘’ 𝑒𝑙 π‘’π‘š 𝑒𝑛𝑑𝑠 ←SearchForNewCandidates(π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒𝑦 π‘‘π‘Ž 𝑏𝑙𝑒,
𝑀,π‘π‘Žπ‘›π‘‘π‘– π‘‘π‘Žπ‘‘ 𝑒 π‘π‘Žπ‘¦π‘™π‘’π‘¦ 𝑑 π‘Žπ‘π‘™π‘’ 𝑒𝑙 π‘’π‘š 𝑒𝑛𝑑 𝑠). ⊲See Algorithm 4.
15: while 𝑙𝑒𝑛(π‘π‘Žπ‘›π‘‘π‘–π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ π‘’π‘™π‘’π‘šπ‘’π‘›π‘‘ 𝑠)>0do
16: π‘ŽπΆβ†pop an element from π‘π‘Žπ‘›π‘‘π‘– π‘‘π‘Žπ‘‘ 𝑒 π‘π‘Žπ‘¦ 𝑙𝑒𝑦 𝑑 π‘Žπ‘π‘™ 𝑒 𝑒𝑙 π‘’π‘šπ‘’π‘›π‘‘π‘ .
17: π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™ 𝑒𝑛𝑑𝑠 𝑓 π‘œπ‘’π‘›π‘‘ ←SearchForEquivalents(π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’,𝑀,π‘ŽπΆ). ⊲See
Algorithm 5.
18: if 𝑙𝑒𝑛(π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™π‘’π‘›π‘‘ 𝑠 𝑓 π‘œπ‘’π‘›π‘‘)β‰ 0then
19: Add π‘ŽπΆto the relevant equivalence class.
20: Continue to the next iteration of the while loop.
21: else
22: Check if π‘ŽπΆbreaks any of the existing equivalence classes.
23: π‘π‘Ÿπ‘œπ‘˜π‘’π‘› π‘’π‘žπ‘’π‘–π‘£ π‘Žπ‘™ 𝑒𝑛𝑐𝑒 π‘π‘™π‘Ž 𝑠𝑠 𝑒𝑠 ←SearchForBrokenEquivalenceClasses(π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Žπ‘¦π‘™ 𝑒𝑦 π‘‘π‘Ž 𝑏𝑙 𝑒,
𝑀,π‘ŽπΆ). ⊲See Algorithm 6.
24: if 𝑙𝑒𝑛(π‘π‘Ÿπ‘œπ‘˜ 𝑒𝑛 π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™ 𝑒𝑛𝑐𝑒 𝑐𝑙 π‘Žπ‘ π‘ π‘’π‘ )β‰ 0then
25: for each new equivalence class do
26: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ ←AddElementToStateCayleyTable(π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒𝑦 π‘‘π‘Žπ‘π‘™π‘’,
𝑀,element labelling new equivalence class). ⊲See Algorithm 3.
27: end for
28: end if
29: Create new equivalence class for π‘ŽπΆ.
30: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ ←AddElementToStateCayleyTable(π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’,
𝑀,π‘ŽπΆ). ⊲See Algorithm 3.
31: end if
32: π‘π‘Žπ‘›π‘‘π‘– π‘‘π‘Žπ‘‘ 𝑒 π‘π‘Žπ‘¦π‘™π‘’π‘¦ 𝑑 π‘Žπ‘π‘™π‘’ 𝑒𝑙 π‘’π‘š 𝑒𝑛𝑑𝑠 ←SearchForNewCandidates(π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’,
𝑀,π‘π‘Žπ‘›π‘‘π‘– π‘‘π‘Žπ‘‘ 𝑒 π‘π‘Žπ‘¦π‘™π‘’π‘¦ 𝑑 π‘Žπ‘π‘™π‘’ 𝑒𝑙 π‘’π‘š 𝑒𝑛𝑑 𝑠). ⊲See Algorithm 4.
33: end while
34: return π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ 18
Algorithm 2 Generate action Cayley table.
Require: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’.
1: π‘Žπ‘π‘‘ π‘–π‘œπ‘› π‘π‘Ž 𝑦𝑙 𝑒𝑦 π‘‘π‘Žπ‘π‘™π‘’ ←an empty square matrix with the dimensions of
π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’, with rows and columns labelled by the rows and columns of
π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’.
2: for π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ in π‘Ž π‘π‘‘π‘–π‘œπ‘› π‘π‘Žπ‘¦π‘™π‘’π‘¦ 𝑑 π‘Žπ‘π‘™ 𝑒 do
3: for π‘π‘œπ‘™π‘’π‘šπ‘› π‘™π‘Žπ‘π‘’π‘™ in π‘Ž π‘π‘‘π‘–π‘œπ‘› π‘π‘Žπ‘¦ 𝑙𝑒𝑦 𝑑 π‘Žπ‘π‘™ 𝑒 do
4: π‘ŽπΆβ†π‘π‘œπ‘™π‘’π‘šπ‘› π‘™π‘Žπ‘π‘’π‘™ β—¦π‘Ÿ π‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™.
5: 𝑒𝑐 𝑙 π‘Žπ‘π‘’π‘™ ←label of equivalence class containing π‘ŽπΆ.
6: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ [π‘Ÿ π‘œπ‘€ π‘™π‘Ž 𝑏𝑒𝑙] [π‘π‘œπ‘™π‘’π‘šπ‘› 𝑙 π‘Žπ‘π‘’π‘™ ]=π‘’π‘π‘™π‘Žπ‘ 𝑒𝑙 .
7: end for
8: end for
9: return: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’.
Algorithm 3 AddElementToStateCayleyTable: Fill state Cayley table row and column
for element π‘Ž.
Require: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’,𝑀: initial world state, π‘Ž.
1: Add new row and new column labelled by π‘Žto π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒𝑦 π‘‘π‘Žπ‘π‘™π‘’.
2: for π‘π‘œπ‘™π‘’π‘šπ‘› π‘™π‘Žπ‘π‘’π‘™ in π‘ π‘‘π‘Žπ‘‘π‘’ 𝑐 π‘Žπ‘¦π‘™ 𝑒𝑦 π‘‘π‘Žπ‘π‘™π‘’ do
3: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ [π‘Ž] [ π‘π‘œπ‘™ π‘’π‘šπ‘› π‘™π‘Žπ‘π‘’π‘™] ← π‘π‘œπ‘™π‘’π‘šπ‘› 𝑙 π‘Žπ‘π‘’π‘™ βˆ— (π‘Žβˆ—π‘€).
4: end for
5: for π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ in π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 𝑑 π‘Žπ‘ 𝑙𝑒 do
6: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ [π‘Ÿ π‘œπ‘€ π‘™π‘Ž 𝑏𝑒𝑙] [π‘Ž] ← π‘Žβˆ— (π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ βˆ—π‘€).
7: end for
Algorithm 4 SearchForNewCandidates: Search for new candidate elements in state
Cayley table.
Require: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’,𝑀: initial world state, π‘π‘Žπ‘›π‘‘π‘–π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’ 𝑒𝑙 π‘’π‘š 𝑒𝑛𝑑 𝑠.
1: for π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ in π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 𝑑 π‘Žπ‘ 𝑙𝑒 do
2: for π‘π‘œπ‘™π‘’π‘šπ‘› π‘™π‘Žπ‘π‘’π‘™ in π‘ π‘‘π‘Žπ‘‘π‘’ 𝑐 π‘Žπ‘¦π‘™ 𝑒𝑦 π‘‘π‘Žπ‘π‘™π‘’ do
3: π‘ŽπΆβ†π‘π‘œπ‘™π‘’π‘šπ‘› π‘™π‘Žπ‘π‘’π‘™ β—¦π‘Ÿ π‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™.
4: π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™ 𝑒𝑛𝑑𝑠 𝑓 π‘œπ‘’π‘›π‘‘ ←SearchForEquivalents(π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’,𝑀,π‘ŽπΆ). ⊲
See Algorithm 5.
5: if 𝑙𝑒𝑛(π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™π‘’π‘›π‘‘ 𝑠 𝑓 π‘œπ‘’π‘›π‘‘)β‰ 0then
6: Add π‘ŽπΆto relevant equivalence class.
7: else
8: Add π‘ŽπΆto π‘π‘Žπ‘›π‘‘π‘– π‘‘π‘Žπ‘‘ 𝑒 π‘π‘Žπ‘¦π‘™π‘’π‘¦ 𝑑 π‘Žπ‘π‘™π‘’ 𝑒𝑙 π‘’π‘š 𝑒𝑛𝑑 𝑠.
9: end if
10: end for
11: end for
12: return π‘π‘Žπ‘›π‘‘π‘– π‘‘π‘Žπ‘‘ 𝑒 π‘π‘Žπ‘¦π‘™π‘’π‘¦ 𝑑 π‘Žπ‘π‘™π‘’ 𝑒𝑙 π‘’π‘š 𝑒𝑛𝑑 𝑠.
19
Algorithm 5 SearchForEquivalents: Search for elements in Cayley table that are equiv-
alent to π‘Ž.
Require: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’,𝑀: initial world state, π‘Ž.
1: π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™ 𝑒𝑛𝑑𝑠 𝑓 π‘œπ‘’π‘›π‘‘ ←empty list.
2: π‘Ž π‘Ÿ π‘œπ‘€ ←empty list. ⊲Generate state Cayley row for π‘Ž.
3: for π‘π‘œπ‘™π‘’π‘šπ‘› π‘™π‘Žπ‘π‘’π‘™ in π‘ π‘‘π‘Žπ‘‘π‘’ 𝑐 π‘Žπ‘¦π‘™ 𝑒𝑦 π‘‘π‘Žπ‘π‘™π‘’ do
4: if π‘π‘œπ‘™π‘’π‘šπ‘› 𝑙 π‘Žπ‘π‘’π‘™ == π‘Žthen
5: Continue to the next iteration of the for loop.
6: end if
7: Append π‘π‘œπ‘™π‘’π‘šπ‘› π‘™π‘Žπ‘ 𝑒𝑙 βˆ— (π‘Žβˆ—π‘€)to π‘Ž π‘Ÿπ‘œπ‘€.
8: end for
9: π‘Ž π‘π‘œπ‘™π‘’π‘šπ‘› ←empty list. ⊲Generate state Cayley column for π‘Ž.
10: for π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ in π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 𝑑 π‘Žπ‘ 𝑙𝑒 do
11: if π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ == π‘Žthen
12: Continue to the next iteration of the for loop.
13: end if
14: Append π‘Žβˆ— (π‘Ÿ π‘œπ‘€ π‘™π‘Ž 𝑏𝑒𝑙 βˆ—π‘€)to π‘Ž π‘π‘œπ‘™π‘’π‘šπ‘›.
15: end for
16: for π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ in π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 𝑑 π‘Žπ‘ 𝑙𝑒 do
17: if (π‘Ž π‘Ÿπ‘œπ‘€,π‘Ž π‘π‘œπ‘™ π‘’π‘šπ‘›)== (π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ row, π‘Ÿπ‘œπ‘€ π‘™π‘Žπ‘π‘’π‘™ column)then
18: Append π‘Ÿ π‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ to π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™ 𝑒𝑛𝑑𝑠 𝑓 π‘œπ‘’π‘›π‘‘.
19: end if
20: end for
21: return π‘’π‘žπ‘’π‘–π‘£π‘Žπ‘™ 𝑒𝑛𝑑𝑠 𝑓 π‘œπ‘’π‘›π‘‘.
Algorithm 6 SearchForBrokenEquivalenceClasses: Find equivalence classes that are
broken by π‘ŽπΆ.
Require: π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 π‘‘π‘Žπ‘π‘™π‘’,𝑀: initial world state, π‘ŽπΆ.
1: for π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ in π‘ π‘‘π‘Žπ‘‘π‘’ π‘π‘Ž 𝑦𝑙 𝑒 𝑦 𝑑 π‘Žπ‘ 𝑙𝑒 do
2: 𝑒𝑐 𝑙 π‘Žπ‘π‘’π‘™ π‘œπ‘’π‘‘π‘π‘œπ‘šπ‘’ β†π‘Ÿπ‘œπ‘€ 𝑙 π‘Žπ‘π‘’π‘™ βˆ— ( π‘ŽπΆβˆ—π‘€).
3: for 𝑒𝑐 𝑒𝑙𝑒 π‘šπ‘’π‘›π‘‘ in equivalence class labelled by π‘Ÿπ‘œπ‘€ π‘™π‘Ž 𝑏𝑒𝑙 do
4: if 𝑒𝑐 𝑙 π‘Žπ‘π‘’π‘™ π‘œπ‘’π‘‘π‘π‘œπ‘šπ‘’ ≠𝑒𝑐 π‘’π‘™π‘’π‘šπ‘’π‘›π‘‘ βˆ— (π‘ŽπΆβˆ—π‘€)then
5: Create a new equivalence class labelled by 𝑒𝑐 𝑒𝑙 π‘’π‘š 𝑒𝑛𝑑 .
6: Remove 𝑒𝑐 𝑒𝑙 π‘’π‘š 𝑒𝑛𝑑 from the equivalence class labelled by π‘Ÿπ‘œπ‘€ π‘™π‘Žπ‘ 𝑒𝑙.
7: end if
8: end for
9: end for
10: return: New equivalence classes.
Displaying the algebra. We display the algebra in two ways: (1) a 𝑀-state Cayley table,
which shows the resulting state of applying the row element to 𝑀followed by the column
element (i.e., 𝑀-state Cayley table value =column labelβˆ—(row labelβˆ—π‘€)), and (2) an action
Cayley table, which shows the resulting element of the algebra when the column element is
applied to the left of the row element (i.e.,action Cayley table value =column element β—¦
row element).
20
𝐴/∼ 1𝐷 𝐿 π‘…π‘ˆ
1𝑀0𝑀2𝑀1𝑀3
𝐷 𝑀2𝑀0𝑀3𝑀1
𝐿 𝑀1𝑀3𝑀0𝑀2
π‘…π‘ˆ 𝑀3𝑀1𝑀2𝑀0
Table 2: 𝑀0state Cayley table for 𝐴/∼.
𝐴/∼ 1𝐷 𝐿 π‘…π‘ˆ
1 1 𝐷 𝐿 π‘…π‘ˆ
𝐷 𝐷 1π‘…π‘ˆ 𝐿
𝐿 𝐿 π‘…π‘ˆ 1𝐷
π‘…π‘ˆ π‘…π‘ˆ 𝐿 𝐷 1
Table 3: Action Cayley table for 𝐴/∼.
Algebra properties. We also check the following properties of the algebra algorithmically:
(1) the presence of identity, including the presence of left and right identity elements
separately, (2) the presence of inverses, including the presence of left and right inverses
for each element, (3) associativity, (4) commutativity, and (5) the order of each element
in the algebra. For our algorithm to successfully generate the algebra of a world, the
world must contain a finite number of states, the agent must have a finite number of
minimum actions, and all the transformations of the world must be due to the actions
of the agent.
3.3.1. Example
For our example world 𝒲
𝑐, the equivalence classes shown in Figure 5 - those labelled
by 1, 𝑅, and π‘ˆ- are the only equivalence classes in 𝐴/∼. The 𝑀-state Cayley table in
Table 2 shows the final world state reached after the following operation: table entry =
column element βˆ— (row element βˆ—π‘€).
The 𝑀-action Cayley table in Table 3 shows the equivalent action in 𝐴/∼ for the same
operation as the 𝑀-state Cayley table: [table entry] βˆ—π‘€=column elementβˆ—(row elementβˆ—
𝑀)for all π‘€βˆˆπ‘Š.
The choice of the equivalence class label in Table 4 is arbitrary; it is better to think
of each equivalence class as a distinct element as shown in the Cayley table in Table 5.
There are four elements in the action algebra, therefore, if the agent learns the rela-
tions between these four elements, and then it has complete knowledge of the transfor-
mations of our example world.
∼equivalence class label ∼equivalence class elements
1 1,11, 𝐷 𝐷, 𝐿𝐿, π‘…π‘ˆ π‘…π‘ˆ, ...
𝐷 𝐷, 𝐷 1,1𝐷, π‘…π‘ˆπΏ, 𝐿 π‘…π‘ˆ, ...
𝐿 𝐿, 𝐿1, π‘…π‘ˆ 𝐷, 1𝐿, 𝐷 π‘…π‘ˆ, ...
π‘…π‘ˆ π‘…π‘ˆ, π‘…π‘ˆ1, 𝐿 𝐷, 𝐷 𝐿, 1π‘…π‘ˆ, ...
Table 4: Action Cayley table equivalence classes.
21
Property Present?
Totality Y
Identity Y
Inverse Y
Associative Y
Commutative Y
Table 6: Properties of the 𝐴/∼ algebra.
Element Order
1 1
𝐷2
𝐿2
π‘…π‘ˆ 2
Table 7: Order of elements in 𝐴/∼.
𝐴/∼ 1234
1 1234
2 2143
3 3412
4 4321
Table 5: Abstract action Cayley table for 𝐴/∼.
Properties of 𝐴/∼ algebra. The properties of the 𝐴/∼ algebra are displayed in Table 6
and show that 𝐴/∼ is a commutative group, where the no-op action is the identity, and
all elements are their own inverses. Since the action algebra of our example world is a
group, it can be described by SBDRL. The order of each element is given by Table 7.
3.4. Conditions for SBDRL to apply
To simplify the problem, we only consider worlds where the transformations of the
world are only due to the actions of an agent for the remainder of this paper unless
otherwise stated; this means we do not have to take into consideration how the agent
would deal with transformations of the world, not due to its actions. Therefore, we will
only consider worlds with 𝐷=𝐷𝐴.
To be a group, 𝐴/∼ must satisfy the properties of (1) identity, (2) associativity, (3)
closure, and (4) inverse. (1) The identity property and (2) the associativity property are
satisfied by Proposition 3.4 and Proposition 3.5 respectively. (3) For the closure property
to be satisfied, the following condition is sufficient:
World condition 1 (Unrestricted actions).For any action π‘Žβˆˆπ΄and for any world
state π‘€βˆˆπ‘Š,