REFLECTIVE ARTIFICIAL INTELLIGENCE
Peter R. Lewis
Faculty of Business and Information Technology
Dept. of Informatics
King’s College London
January 27, 2023
ABS TRAC T
Artiﬁcial Intelligence (AI) is about making computers that do the sorts of things that minds can do, and
as we progress towards this goal, we tend to increasingly delegate human tasks to machines. However,
AI systems usually do these tasks with an unusual imbalance of insight and understanding: new,
deeper insights are present, yet many important qualities that a human mind would have previously
brought to the activity are utterly absent. Therefore, it is crucial to ask which features of minds have
we replicated, which are missing, and if that matters. One core feature that humans bring to tasks,
when dealing with the ambiguity, emergent knowledge, and social context presented by the world, is
reﬂection. Yet this capability is utterly missing from current mainstream AI. In this paper we ask what
reﬂective AI might look like. Then, drawing on notions of reﬂection in complex systems, cognitive
science, and agents, we sketch an architecture for reﬂective AI agents, and highlight ways forward.
Margaret Boden has described artiﬁcial intelligence as being about making ‘computers that do the sorts of things that
minds can do’ [
, p1]. One strength of this deﬁnition lies in the fact that it does not start from an arbitrary description of
the things that might be necessary or sufﬁcient for a system to ‘count’ as AI, but it encourages us to ask: what are the
sorts of things that our minds do? Further, a curious mind is then tempted to ask: could we replicate these things? If so,
how? If not, why not and does that matter?
The deﬁnition also implies that there are things that human minds currently do, that in the future computers might do
instead. This is not only true now, but has always been the case throughout human history, and likely will be far into
the future. It is this transference of activity that gives rise to the seemingly constant stream of examples of new ‘AI
technologies’. These mostly do things that human minds used to, or wished to do. This is, of course, also the source of
many of the issues and beneﬁts that arise from the creation and use of AI technology: as we ﬁgure out how to replicate
some of the things that minds can do, we delegate these things to machines. This typically brings increased automation,
scale, and efﬁciency, which themselves contain the seeds of both enormous potential social and economic beneﬁt, and
potential real danger and strife.
Further, we can notice that these AI technologies usually do this with an unusual (im)balance of insight and understand-
ing. New, deeper insight and understanding often arise from the models employed, while many of the ‘qualities’ that a
human mind would have previously brought to the activity, are utterly absent.
In designing and analysing embodied AI technologies, the concept of an intelligent agent is central, and necessitates
descriptions that are abstracted from the natural intelligences they are inspired by or modelled on. This abstraction, in
turn, means that the notion of an AI agent only partially captures the mental, cognitive, and physical features of natural
intelligence. Hence, it is important to ask: are the features that we have included sufﬁcient for what is needed? Are we
satisﬁed with leaving out those which we did?
Frank and Virginia Dignum have recently reminded us how powerful the concept of an agent is in AI [
]. They have also
pointed out some of the paradigmatic failures of agent-based modelling (ABM) and multi-agent systems (MAS). ABM
arXiv:2301.10823v1 [cs.AI] 25 Jan 2023
APREPRINT - JAN UARY 27, 2023
methodologies aim to describe a large and complex system of agent populations by using analysis tools in the form
of agent-based simulation. MAS methodologies, instead, focus on the operational side of interacting systems, where
agents operate to create changes in their environment. However, neither methodology is ﬁt for designing human-like
agent architectures. The focus of their discussion [
] is to propose a social MAS architecture and argue that future
socially-aware AI architectures should be different from today’s common utility- and goal- driven models. A similar
proposal was made by Ron Sun years ago, namely that agent methodologies need Cognitive Science and vice-versa [
to address complex socially-aware AI and be able to design such architectures. Antonio Lieto refreshes this proposal [
In this paper we continue this line of thought, sketching an agent architecture that captures some reﬂective capabilities,
based on cognitive theories. Due to the complex and modular nature of reﬂection it is impossible to ﬁnd a single unique
and crisp clear deﬁnition of the term reﬂection [
]. Reducing the deﬁnition to a single process or component of an
architecture would fail to address the richness of this cognitive process and would be counter-productive in explaining
how all of the processes and components at play interact for human-like reﬂection to happen. Thus, in order to do it
justice, similarly to , we adopt a differentiated theory approach to convey the notion of reﬂection.
2 Playing Chess Isn’t Just About Chess
So what are the sorts of things minds do? As Richard Bellman suggested in 1978 [
], these include activities such
as ‘decision-making, problem solving, learning...’ And the sheer quantity of research on machines that can do these
activities is astounding. Yet as Bellman’s ellipsis suggests, this is clearly an incomplete list. Perhaps any such list would
be. It might be more useful to think situationally. We can ask: which features of our minds do we bring to different
activities? Let us explore a thought experiment using a canonical example: chess. When playing chess, we largely
bring the ability to reason, to plan ahead, to use heuristics, and to remember and recall sequences of moves, such as the
caro-kann defence. Against an anonymous opponent on the Internet, we might try to use these abilities as best we can.
When playing chess with a child, however, we might typically bring a few more features too: patience, empathy (for
example to understand the child’s current mental model of the game to help coach them), and also some compassion,
since proﬁcient chess players could likely beat most children every time and make it less interesting all round. Letting
children win is also not helpful, but a parent might play out a different sequence of moves to open up more in-game
experiences from time to time. As a young player grows up, beneﬁting from both more brain development and
experience at chess, and ﬁnds joy in different parts of the game, the way an adult opponent might do this will change. A
good teacher might think back over previous games, reﬂect on the changes in the child’s understanding and reasoning,
and responses to moves. They might use this to speculate on and mentally play out possible future games.
This chess example illustrates three points: (i) even playing chess is not just about problem solving; (ii) rather
unsurprisingly, our mental features are rich, contextual, and ﬂexible; and (iii) we reﬂect on our situations, our current
and past behaviour in them, and the likely outcome of those behaviours including the impact on others, in order to
choose which mental features to engage. This is not just about ﬂexible behaviour selection, it’s about which mechanisms
– which of Boden’s sorts of things – even kick in. What can this teach us about how we might want to build AI systems?
Returning to the idea that we are delegating mental activity to machines, it tells us that perhaps we might want to have a
similar ability in AI agents.
3 The Dangers of Incomplete Minds
Many people tie themselves in knots trying to deﬁne ‘intelligence’, hoping that that will lead us to somewhat of a more
complete (and, they often say, more helpful) deﬁnition of ‘artiﬁcial intelligence’. One example of such a discussion can
be found in a recent special issue of the Journal of Artiﬁcial General Intelligence [
]. As pointed out by Sloman in that
collection, much of this deﬁnitional wrangling misses the point, at least from the perspective of deciding when we want
to accept a computer to replace part of the activity previously done in society by human minds. Better questions might
be to ask: what can this thing do; and what is it for? Consider: if we are deciding to put a machine in a position where it
is carrying out a task in a way that we are satisﬁed is equivalent to what previously only a human mind could do, we
have admitted something about the nature of either the task, or the machine, or our minds. Perhaps an AI system is
simply a machine that operates sufﬁciently similarly to our mind, at least in some situations, that we are prepared to
accept the machine operating in lieu of us. So this leads us to ask when and why we would be prepared to accept this.
Or perhaps, given most AI systems (and minds) cannot be not fully understood or controlled, when and why we would
be prepared to trust it to do so .
In one recent example, the seemingly harmless act of allowing a ‘smart’ voice assistant to propose entertainment
activities to a child led to a life-threatening suggestion from a supposed trusted AI
. Normally, when delegating the
APREPRINT - JAN UARY 27, 2023
proposal of children’s play activities, we would expect that the person we had delegated that to would have not only a
decent dose of common sense, but also the ability to consider the potential consequences of any ideas that sprung to
mind before vocalizing them.
In another now well-known example, Amazon’s automated recruiting tool, trained on data from previous hiring
decisions, discriminated based on gender for technical jobs [
]. Here, the delegation is from professional recruiters and
hiring managers to a computer that replicates (some of) the mental activity they used to do. The aims are automation,
scale, and efﬁciency. That such a sexist system was put into practice at all is at the very least unfortunate and negligent.
It is also tempting to argue that these are ‘just bad apples’, and that better regulation is the answer. It may be. But what
is particularly interesting in our context is that people – hiring managers, shareholders, applicants – trusted the system
to do something that, previously, a human mind did. But unlike the mind of the professional it replaced, it had no way
of reﬂecting on the social or ethical consequences, or on the virtue or social value of its actions, or even if its actions
were congruent with prevailing norms or values. That it had no way of reﬂecting on this meant that it also stood no
chance at stopping or correcting itself. Indeed, neither of the above AI systems even had the mental machinery to do
such a thing – this part of the mental activity is, as yet, nowhere near delegated. This leads to an unusual divorce of
accompanying mental qualities that would normally work in concert. No wonder the behaviour might seem a little
pathological. As humans, a core part of our intelligence is our ability to reﬂect in these ways; reﬂection is a core mental
mechanism that we use to evaluate ourselves. The existence of this form of self-awareness and self-regulation can be
key to why others may ﬁnd us trustworthy. Could we expect the same of machines?
4 The Role of Reﬂection in Driving Human Behaviour
One aspect of reﬂection is captured by what Socrates called his daemon [
], something that checked him ‘from any act
opposed to his true moral and intellectual interests’ [
]. Socrates saw this as a divine signal, not proposing action,
but monitoring it, and intervening if necessary. If such a check were based on morals or ethics, we might call this a
conscience. If it were based on broader goals than simply the immediate (for example, choosing a chess move to make
against your daughter), we might call this considering the bigger picture. Essentially, this is a process that notices
what we are thinking, what we are considering doing, and allows and explores the thought, but can prevent the action.
It decides whether to do this by contextualising the action. Contexts, as alluded to above, might be ethical, cultural,
political, social, or based on non-immediate (higher-level, longer-term, or not immediately visible) goals.
What Socrates presents here requires a ‘Popperian’ mind according to Dennett’s Tower of Generate and Test [
Essentially, in what he describes as a framework for ‘design options for brains’, Dennett notes that (at the bottom of
the Tower) the testing of hypotheses is done by Darwinian evolution: hypotheses are generated through mutations and
the placing of novel organisms in the world and tested through their survival. Above this, Skinnerian creatures test
hypotheses by taking actions and learning in an operant conditioning fashion, based on environmental feedback within
their lifetime. Higher still are Popperian and Gregorian creatures, which have the mental capability to bring hypothesis
testing internally to their mind, rather than requiring it to be done in the world. Both of these operate with forms of
reﬂection: put simply, Popperian creatures think about what to think, and Gregorian creatures, using tools, language,
and culture, extend this to think about how to think.
One plausible way these Popperian and Gregorian creatures’ minds might work is Hesslow’s Simulation Theory of
]. Hesslow’s hypothesis is that there exists a mechanism in the brain that helps agents reason about
the consequences of their actions in an environment by simulating the stimuli of their behaviour in that environment,
without having this behaviour previously reinforced by actual stimuli generated by their past behaviour. For example,
this mechanism allows an agent to think about the deadly consequences of driving towards a concrete wall at high speed
without having done it beforehand.
compares the inherent nature of reﬂection in professional practice with a purely technically rational
approach that might be characterised by up-front speciﬁcation and subsequent problem solving. As opposed to
passive problem solving, active experimentation is emphasised by how professional practitioners deal ‘on-the-ﬂy’ with
uncertainty, ambiguity and emergent knowledge inherent in tasks. From a technical rationality perspective,
argues, ‘professional practice is a process of problem solving’, yet, ‘in real-world practice, problems do not present
themselves to the practitioner as givens. They must be constructed from the materials of problematic situations which
are puzzling, troubling, and uncertain.’ This means that the sorts of things that (to continue the example) a professional
recruiter does is not simply problem solving in a deﬁned setting: there are patterns and mechanical aspects to their
work, but the problem is always somewhat uncertain, and emerges from practice and the setting. Thus, we arrive at
what Schön describes as ‘an epistemology of practice which places technical problem solving within a broader context
of reﬂective inquiry.’
APREPRINT - JAN UARY 27, 2023
Figure 1: Kolb’s ‘Experiential Learning Model’. Source: [
]. The model shows captures the cognitive cycle in humans
that is responsible for learning from experience.
A model that captures reﬂection in practice, that is both exploratory and governed by a sense of the bigger picture
and the principles that govern our intended direction, is Kolb’s learning cycle [
]. His ‘Experiential Learning Model’
comprises four phases: i) having a concrete experience, ii) an observation and subjective evaluation of that experience
in context, iii) the formation of abstract concepts based upon the evaluation, and iv) the formation of an intention to test
the new concepts, leading to further experience.
5 Where are we now?
Reﬂection in humans is complex and comprises numerous related phenomena. This makes it extremely difﬁcult, if not
impossible to ﬁnd a single, crisp, and clear deﬁnition. This is usually the case with complex socio-cognitive phenomena
). Our approach instead is to contribute to building a ‘differentiated theory’, as is often done in social
]. This allows us to collect and compare the different ways in which phenomena all commonly referred to
as being part of ‘reﬂection’ interact. In doing so, we aim to build towards a socio-cognitive theory of reﬂection in AI.
Let us ﬁrst examine the current state of AI, in this light.
Figure 2: Critic Agent Architecture [
]. We introduce this architecture as a baseline AI architecture that manages to
capture various aspects of perceiving, learning, planning, reasoning and acting as different qualitative processes. One
can visually contrast this architecture with with the other mainstream architectures, old and new, in AI.
Artiﬁcial Neural Network Architectures (ANN)
Initially introduced by
and in the form of a simple
, ANNs have gained major traction in the AI community. ANNs are highly applicable in
the domain of statistical machine learning in which they are trained to perform various tasks, and outperform humans in
quite a few of these tasks [
]. However, like any supervised learning model, ANN’s over-reliance on historical data
APREPRINT - JAN UARY 27, 2023
Figure 3: ANN architecture (left, by Colin Burnett from
]. Considering these common machine learning architectures, it is clear that there is a lack of any reﬂective
‘loop’. Although these achieve different outcomes, they are qualitatively equivalent in the sense that they both operate
at a single level of abstraction when it comes to information processing. There is no self-reference: the loops in both
cases are for feedback, in much the same way that the Critic Agent operates. Additionally, even though Kolb’s model of
experiential learning is a model of learning in humans, it also presents (albeit at a high level) qualitative processes that
ANNs and GANs do not.
means that they learn to repeat what has been done, not what ought to be done. Coupled with their largely black-box
nature, this leads to a propagation of existing systemic biases that is difﬁcult to identify or address. Post-hoc methods to
interpret and ‘explain’ ANN-based models, such as LIME and SHAP [
], result not in an explanation of the internal
mechanics of ANNs, but approximations in the form of equivalent interpretable models. For instance, in order to explain
a deep-ANN, a decision tree or a heat-map are generated as an approximate function between the inputs and the outputs
of the deep-ANN. This may, perhaps, be seen as a form of external, open loop reﬂection (but typically by others, not by
the system itself); in and of themselves, the architecture of (feed-forward) ANNs has no capability for reﬂection.
Generative Adversarial Network Architectures (GAN)
] are one recent example of how ANNs can be
used as building blocks within an explicitly designed architecture. These pitch two multi-layered perceptrons (ANNs)
against each other in a 2-player minimax game. The higher-level architecture here captures the sort of competitive
creative co-adaptation found within co-evolutionary systems.
When it comes to the human ability of reﬂection, GANs by themselves are incapable of representing the process. While
their architecture contains a feedback loop, it does not operate at the meta level: the architecture is ‘ﬂat’. It is not
generally considered that a GAN (or coevolution in general) adds any type of high-level cognitive process. While
ANNs in general are just clusters of interconnected nodes with weighted edges, and the same may be said of the brain,
we contend that there are essentially two approaches to generating cognitive processes of this type: one is a complex
systems approach, where the virtual machine [
] operationalizing cognition emerges through complexity; the
second is through an explicit architecture, as we do in this paper. Because AI agents that solely use ANNs such as these
cannot reﬂect about themselves and the consequences of their actions in the world, they can behave anti-socially with
no ability to know this.
Practical Reasoning Architectures
Regarding the difference between reﬂection and deliberation employed in
practical reasoning, deliberation is a process for thinking out decisions, whereas reﬂection is a higher-level process
that situates the agent that performs deliberation in a context through abstract conceptualisation. Deliberation does not
require self-representation through abstract conceptualisation, because deliberation can be done at symbol-level, e.g.,
implementing deliberation strategies and selecting them using a procedural reﬂection process (a rather reductive notion
of reﬂection). Procedural reﬂection is a meta-reasoning in PRS - the Procedural Reasoning System (based on Lisp [
see Fig. 4) - on which later BDI models [
], Fig. 6, were based). The PRS/BDI type of reﬂection is passing symbols
from the previous state (or lower symbolic level) to the current one such that we can say what the system was up to in
the previous state [
]. There is no nested reasoning, or self-representation, both which are are crucial for many forms
APREPRINT - JAN UARY 27, 2023
Figure 4: PRS architecture (Source: [
]). The PRS architecture is similar to the Critic agent architecture in the sense
that it allows us to break down different qualitative processes. The difference between the Critic architecture and the
PRS is that the PRS does not include a learning component, but it has a richer representation of the processes and
elements responsible with driving the reasoning behind the actions that are executed in the environment. PRS also
allows for an eventual learning component to be plugged into the system interface which feeds data into the belief base.
Even the use of procedural reﬂection has only been recently done using more recent architectures
such as BDI. According to a recent survey on BDI agents [
], only one paper was identiﬁed that implemented reﬂection
in BDI architectures. In that work [
], BDI agents can use system-wide instructions to identify the context in which
they operate and this enables them to use a rather reductive notion of reﬂection called procedural reﬂection to select
deliberation strategies .
Regarding the distinction between active experimentation and exploration, active experimentation is also a multifaceted
process, that includes at least exploration and active learning at the meta-level, and also intentional reconceptualizations
of existing knowledge, i.e. Dennett’s Tower of Generate-and-Test [
], in order to be able to reﬂect on the value of
Domain Expert Systems
These, such as tutoring expert systems, do not replicate reﬂection beyond in a rudimentary
sense either. For advanced domain expert tutoring systems that are based on architectures like ACT-R [
] and that
implement some learning theory, they are reﬂective, but in the procedural sense that we explained above - Lisp style
(see Fig. 5) [
]. Another issue with systems like ACT-R is that they are architectures for domain expert systems where
the environment is part of the system, not agent-based architectures like the critic agent architecture (Fig. 2) where
agents act in an observed environment.
To summarise, the ANN architectures discussed above do not allow for reﬂection to be captured. Conversely, PRS, BDI,
and ACT-R do not exclude it; neither do they explicitly describe it.
6 Building Reﬂective AI Agents
In order to make an agent reﬂective, thus expanding the list of Boden’s ‘sorts of things’, we ﬁrst need an architecture.
We must separate out reﬂection from decision making and action.
APREPRINT - JAN UARY 27, 2023
Module Environment Motor
Figure 5: ACT-R architecture [
]. It is crucial to note that ACT-R is not an AI agent architecture, rather a cognitive
architecture that was used as an expert system. The original purpose of ACT-R is to map and understand human
cognition as a set of modular components that execute procedures to produce behaviour in a speciﬁc domain. ACT-R
assumes that all cognitive components are represented and driven by declarative and procedural memory.
Second, we need a suite of reﬂective cognition processes that may be included depending on the form of reﬂection
desired. A given instance of a reﬂective agent may have one or more or all of these processes, in line with the
differentiated theory approach. We categorise these processes in four tiers:
Tier 1 Reﬂective Agent:
This incorporates models of self and others, and a process to reason using these models in
order to ask itself what-if questions concerning generated actions. This enables a Popperian-style consequence engine
and reﬂective governance process, able to evaluate proposed actions in context (acknowledging that context can change)
and at least block some actions.
Tier 2 Reﬂective Agent:
Adds processes that learn new reﬂective models, including incorporating feedback from new
experiences into them incrementally. This addition enables Kolb-style reﬂective experiential learning.
Tier 3 Reﬂective Agent:
Adds a reﬂective reasoning process that proposes not only a single ‘optimal’ solution, but
is ready to present a diversity of possible ways forward – hypotheses to be tested – based on different approaches to
solving the problem (including safe ways of disengaging from it).
There are many ways an agent may generate proposed actions. These vary in complexity substantially, from simple
randomised search (e.g. mutation or exploration) through to heuristic and guided search approaches, up to potentially
advanced forms of artiﬁcial creativity and strategic planning.
These approaches provide the ability to deliberate about novel possible strategies for action, including in new or
potential (imagined) situations, and to evaluate these internally by reasoning with the reﬂective models.
Tier 4 Reﬂective Agent:
Adds the ability to re-represent existing learnt models in new ways. This facilitates new
reasoning possibilities and the potential for new insights. It provides a Gregorian-style ability to change the way the
Third, we need a way of representing the broader context: we need models of prevailing norms, and of the social values
associated with the outcomes of different possible actions, and of other higher-level goals that may not be immediately
or obviously relevant to the task.
Note that these components are mostly not new, but it is their novel combination and integration that provides new
capability. Indeed, there are now several decades of work on reﬂective architectures, including early work like Landauer
and Bellman’s Wrappings [
], and Brazier and Treur’s [
] speciﬁcation for agents that can reason reﬂectively about
APREPRINT - JAN UARY 27, 2023
Figure 6: BDI architecture (by Jomi F. Hübner from
). The BDI architecture was designed to help AI agent designers build intuitive and
interpretable AI agents capable of practical reasoning. The architecture depicts different qualitative processes and
elements responsible for meta-reasoning (deliberation) and belief revision (BRF), which then help the agent decide
what to do in a given circumstance in order to achieve their goals/desires in a dynamic environment.
information states. More recently, Blum et al’s Consequence Engine architecture [
], the EPiCS architecture [
the LRA-M architecture , are all aimed explicitly at achieving computational self-awareness through reﬂection.
On this broader point, self-awareness, often considered as the capacity to be ‘the object of one’s own attention’ [
has long been targeted as a valuable property for computational systems to possess [
], owing to the value of
its functional role and evolutionary advantage in biological organisms [
]. Computational forms of self-awareness
require reﬂective processes that access, build, and operate on self-knowledge [
]. This self-knowledge is typically
described according to ﬁve ‘levels of self-awareness’ [
] rooted in the work of
, although may
consider many other aspects [
]. In some cases these are trivial self-models, for example a smartphone may have an
internal parameter that captures whether its charging port contains moisture. Slightly more complex, the device may
learn an internal model of its typical charging behaviour, sufﬁciently to act meaningfully on, and this may adapt as the
battery degrades. In more complex examples still, a cyber-physical system may have a model of available resources
discovered at run-time .
Learning and reasoning with self-knowledge requires a reﬂective self-modelling process [
] of the type described
here. The exact form of such learning and self-modelling will vary depending on requirements and situation, but some
examples include self-modelling based on abstraction from run-time data (e.g.,
), or simulation of oneself
in the environment [
]. As Blum et al demonstrate, such simulations may be used as ‘consequence engines’,
similarly to how  describes the ability of the human brain to execute processes of internal cognitive simulation.
The LRA-M model proposed by
(Figure 7), a commonly used reﬂective architecture that comes from
the area of self-adaptive systems research, captures computational reﬂection at an abstract level. However, this leaves
unclear several aspects associated with agents – e.g., what process generates the actions? Comparing this with a standard
learning-based Critic agent [
], we can see the inverse is true: learning and action selection are present, but reﬂection
is not (see Figure 2).
Hence, here we propose one way to integrate the architecture of learning agents with the reﬂective schema captured
by Kounev et al. In this way, a reﬂective architecture enables information to be abstracted and reasoned with at the
meta-level, feeding back to update goals for learning, and to regulate behaviour.
We motivate our choice to base our architecture on Russell & Norvig’s [
] Critic Agent, and further for using Kounev
et al’s [
] reﬂective loop for discussing reﬂection in AI, since they enjoy broad understanding and acceptance in the
domains of agent architecture and computational reﬂection, respectively. The critic agent, not because it is the best or
most state-of-the-art for any particular domain, but because it allows us to illustrate how reﬂection can be incorporated
APREPRINT - JAN UARY 27, 2023
Figure 7: LRA-M Reﬂective Architecture. Source: [
]. The Learn-Reason-Act-Model (LRA-M) model was designed
as a reference architecture to capture the essential components of computational self-awareness and their relations.
Strictly speaking, acting is considered optional, depicted by the dashed line, though is typically the purpose of the
self-awareness in a practical system. The circular arrows signify that learning and reasoning are ongoing processes at
run-time, based on streaming data from ongoing observations of the world and oneself. Learning and reasoning also
operate on existing internal models, including processes such as re-representation, abstraction, and planning.
into a very widely used and understood standard agent architecture. This, we hope, makes the article and argument more
accessible. While there are also many reﬂective loops that we could have chosen, Kounev’s is one that enjoys broad
support, particularly from the self-adaptive systems community. Indeed, the article that presents that was the result of a
large community effort at a Dagstuhl Seminar. Thus, while we acknowledge (and hope) that many other architectures
can be paired with other forms of reﬂective loop, in this article, we use these two as an illustration and ﬁrst step.
What we can now see is missing is a simple, generic schema for how reﬂection relates to existing agent architectures
commonly used in modern AI. We propose such an architecture, as a synthesis of Russell and Norvig’s critic agent and
Kounev et al’s LRA-M architecture for reﬂective computational self-awareness, illustrated in Figure 8.
The advantage of an architectural approach is that it describes a separate set of processes, and we know that building
systems that self-monitor is easier using an ‘external layer’ style [
]. Further note that what we propose is not an
architecture that passes on information from one module to another as is the case in numerous hybrid approaches that
aim to marry symbolic and sub-symbolic models [
]. What we propose is a cognitive architecture for reﬂection which
can interpret information before passing it from one module to another. The interpretation of information is dynamic
and happens in multiple processes. Below we describe how our proposed architecture ensures information interpretation
at different cognitive levels through process ﬂows.
There are many information ﬂows enabled by the addition of this reﬂective capability. Here, we sketch some of the most
obvious and perhaps important ones, particularly those that link to the conceptual discussion above. We categorised the
ﬂows according to the corresponding tiers of reﬂective agents:
Tier 1 Flow – Governance:
APREPRINT - JAN UARY 27, 2023
Figure 8: Proposed Reﬂective Agent Architecture.
• Flow 1: Governing Behaviour:
Actuators →Reﬂective Reasoning →Actuators
E.g. intervening to prevent an intended action.
Tier 2 Flows – Integrating experience and external factors:
• Flow 2: Abstract Conceptualization of Experience
Sensors →Reﬂective Learning →Reﬂective Models →Reﬂective Reasoning →Critic.
E.g. Kolbian experiential learning through conceptualization of new and changing experiences.
• Flow 3: Learn about and integrate new extrinsic factors into operational goals:
Higher-Level Extrinsic Goals
E.g. learning about and integrating new external factors, such as social norms, standards, and new user
preferences, discovered in the environment, such as signs, verbal instructions, and observation of behaviour.
• Flow 4: Integrate new design goals into existing reﬂective models and operational goals:
Higher-Level Extrinsic Goals →Reﬂective models →Reﬂective Reasoning →Critic.
E.g. learning about and integrating new external factors, such as social norms, standards, and new user
preferences, discovered in the environment, such as signs and verbal instructions.
Tier 3 Flows – Critique and Imagination:
• Flow 5: Active Experimentation to Improve Potential Behaviour
Actuators →Reﬂective Reasoning →Critic →Learning Element →Performance Element →Actuators
E.g. Using the information that an action was intervened upon in order to adapt what the learning element
learns, and hopefully avoid the situation in future. Or, creatively proposing novel courses of action and testing
hypotheses regarding them.
• Flow 6: Reﬂecting on effectiveness of current operational goals and progress towards them:
Reﬂective Reasoning →Critic →Reﬂective Reasoning.
APREPRINT - JAN UARY 27, 2023
E.g. Counterfactual reasoning about current and potential goals, the ‘what’ of operational learning; black-box
reasoning about progress towards them, for example asking ‘am I stuck?’ or ’would a different reward
function better serve my high-level goals?’
• Flow 7: Reﬂecting on the current mechanisms of learning:
Reﬂective Reasoning →Learning Element →Reﬂective Reasoning.
E.g. White-box reasoning about current operational learning mechanisms, the ‘how’, for example asking ‘how
am I learning to do this?’ and ‘could I try to learn in a different way?’
Tier 4 Flow – Re-Representation:
• Flow 8: Reﬂective Thinking:
Reﬂective Reasoning →Reﬂective Models →Reﬂective Reasoning
E.g. refactoring and reconciling models, re-representing existing conceptual knowledge, concept synthesis.
One important and powerful insight is that these information ﬂows can be treated as ‘primitives’ and composed to
provide additional and more complex cognitive features. For example, the composition of Flows 2 & 6 could give rise
to curiosity-driven behaviour, while adding Flow 8 to this allows the result of the curiosity to be integrated into existing
knowledge. Similarly, Flows 1 & 8 support reﬂecting on behaviour governance, for example reconciling competing
imperatives, assessing the effectiveness of an intervention, or deliberating over an action. Adding Flow 6 to this, permits
the deliberation to not only act over potential actions, but over potential directions for future learning.
There are a number of research challenges here. Speciﬁcally, there is a need to understand how to operationalize
the above information ﬂows, including operational semantics, APIs, and methods for the semantic transformation of
information from symbolic to sub-symbolic levels and vice-versa.
7 How Do We Get There?
Many of the individual components required to realize Reﬂective AI already exist. In some cases, the challenge is
to integrate these in a purposeful way to achieve the vision set out above. In other cases, there remain important
fundamental research challenges. In this Section, we outline some of these in key areas.
7.1 Reﬂective Learning
Reﬂective Learning lies conceptually at the core of the proposed architecture. Fundamentally, learning here provides
two forms of modelling capabilities: abstraction and simulation, which support reasoning in complementary ways.
Abstraction is key in both learning and re-representation and facilitates reﬂective reasoning: imagine you have two
cognitive modules with information passed between them (cf.
), if the agent does not have abstraction
capabilities, then it will keep using the two modules to play ping-pong while the ball remains unchanged as it passes
from bat to bat. However, if it has processes in the architecture that transforms the information, then it is no longer a
game of ping-pong, but instead a dialectic where new understanding emerges.
However, abstract conceptualisation capabilities have not entered mainstream AI research yet. As illustrated in Figure 8,
such a reﬂective process could start from reﬂective observation, which takes the data output of the Sensors and passes
them to the reﬂective layer, where it uses a reﬂective learning process to transform this data into concepts that can then
populate various new or updated self-models (abstract conceptualisation). Reasoning over these models can lead to
intentional active experimentation, targeted at generating new experiences to observe, thus continuing the cycle.
Simulation models support a further form of reasoning, over consequences [
]. This permits Dennet’s ‘Popperian’
mind, where hypothesis generation and testing can be carried out internally to the cognition of the agent, without
requiring the world. For example, in the style of
, an agent may build a simulation model in the form of
a digital twin of itself in its environment. With sufﬁcient interpretability and accompanied by automated reasoning
processes, this may be complemented with an abstract conceptualisation, for example, that provides the understanding
that the simulation contains an evolutionary stable strategy.
Note that neither the architecture nor the concept of reﬂective learning prescribe a particular learning algorithm. Many
learning techniques can be used. The choice of technique itself is open-ended and can be made to suit the context so
long as it adheres to, we posit, two conditions. First, that it is model-based, such that the process of learning produces a
APREPRINT - JAN UARY 27, 2023
model that captures some knowledge about the system and its environment. Interpretable models should be favoured, as
the reasoning processes may then operate on these interpretations automatically. Second, that it operates online, such
that it can incrementally build and update these models and be used in an anytime fashion.
note that online learning algorithms are one of the key ingredients in achieving computational
self-awareness. They further note that such online learning must be able to deal with concept drift, since both the
system and its environment change.
show how existing online learning algorithms can be used for
reﬂective self-awareness at different levels, but perhaps most importantly, they intentionally do not propose a preferred
online learning paradigm, rather highlighting that empirical results suggest that using different learning techniques
according to context can lead to enhanced performance. Complementary examples are presented in a collection edited
by (author?) , who arrives at a similar conclusion.
In the future, given a mechanism for representing concepts [
], an AI agent could use Kolbian abstract conceptualisation
to form new concepts and more meaningful models of itself and others in a shared system. Simultaneously, an agent
could build simulation models of itself in its environment, to enable Popperian hypothesis testing. Both model forms
provide complementary beneﬁts [
] as forms of reﬂective modelling for meta-reasoning [
], and in different ways,
require the ability to learn models on-the-ﬂy .
There is a need to develop mechanisms that learn human- and machine-interpretable conceptual
and simulation models from empirical data and semantic information in the world, and further, to develop (unsupervised)
methods for this to be done on the ﬂy in a complex environment.
7.2 Reﬂective Governance
The proposed architecture captures Socrates’s daemon through a governor loop, as mediator between reﬂective reasoning
and an agent’s actuators. This loop is a process of deliberation at the meta-level. Reﬂection captures this process and
situates it in a context, i.e., in an agent’s model of the self and others in the world through abstract conceptualisation or
simulation. Thus, the system does not need to re-learn its decision model if something in the set of oughts (higher-level
extrinsic goals) in its situation changes – though it might want to, later. It just needs to check the behaviour against
them, and occasionally say ‘no, that’s not appropriate; give me an alternative, try a different approach.’
Regarding the ethical nature of this, explicitly ethical agents are nothing new, at least since
way of discerning four different ‘types’. Indeed, the question of imbuing artiﬁcial agents with ethical values was the
topic of a special issue of Proceedings of the IEEE. Winﬁeld et al’s summary [
] and Cervantes et al’s survey [
provide an introduction. And indeed, Winﬁeld and colleagues provided an early example [
] of putting these kinds of
‘ethical governors’ into robots, as consequence engines [
]; concerns also exist about whether explicit ethical agents
are a good idea .
There is a need to develop inclusive, participatory methods for capturing values, norms, and
preferences in formal, interpretable models that can be translated for use a) in a critic module to drive learning, b) as
part of the behaviour governance process, and c) that respects the diversity of interpretation of human values that exists.
There is a further need to develop governance and learning processes that adopt these in order to generate and ensure
behaviour is aligned with them, as emphasised throughout the Royal Society Special Issue edited by .
7.3 Reﬂective Deliberation
Going deeper still, agents could extend the above with reﬂective deliberation. Reﬂective agents can deliberate by using
active experimentation between reﬂective reasoning and critic (see Figure 8) from time to time to ﬁnd alternative ways
of approaching problems. When considering ﬁnding multiple possible diverse and viable courses of action, we can draw
on the rich and active research activity on dialogues, practical reasoning and value-based argumentation [
These could help us to ﬁnd new, different solutions, that come at a problem from a novel angle. And when evaluating
these alternatives, we may choose to formulate the very notion of what ‘successful’ means according to our values; and
in adopting these we must acknowledge that the best action may be a compromise. To instantiate this sort of reﬂection,
agents could employ value-based practical reasoning mechanisms such as action based alternating transition systems
with values (AATS+V) or dialogue protocols [
]. In turn, these are used to build argument schemes [
] which agents
can use for both reﬂecting on their possible decisions, as well as justifying their decisions by providing explanations
Agents need to be able to perform internal simulations of their actions and check the outcomes of
these actions inside their own mind in order to perform deliberation. There is therefore a need to develop semantics and
nested abstract models of the world for agent architectures to enable agents to go beyond the procedural reﬂection of
BDI and PRS-like systems, by having the capability to run, analyze, and interpret new simulation models on the ﬂy,
APREPRINT - JAN UARY 27, 2023
according to need. One idea could be to develop polymorphic simulation models, that can be instantiated into speciﬁc
simulations based on the learnt concepts and the need.
7.4 Social Context
Mentalistic capabilities, as we have explained in the chess example, play an important role in reﬂecting about one’s
complex decisions. Again, BDI-like agents can be given both the ability to communicate their decisions to other
agents as well as the ability to model the minds of other agents inside their own cognitive architecture in order to
better coordinate, or even delegate tasks [
]. Social interactions can be modelled and implemented with dialogue
frameworks so that agents can explain and justify their behaviour[
]. Modelling social context is a rich research
ﬁeld. Formal models of norms can be captured using deontic logic; research in normative systems considers the
capturing of norms in agents [
] and human-robot interactions [
]. Social context also includes social values
represented in Higher-Level Extrinsic Goals.Solution concepts [
] give us one way to formalise these. These
can be directly learned at the reﬂective layer by the agent through reﬂective learning. An AI system able to reﬂect
on its actions in terms of social context would need to draw on formal models such as these. Work on normative
reasoning in open MAS could play a crucial role, ranging from negotiation between individuals to engineering electronic
institutions [80, 81, 82].
There is a need to develop the semantics and nested abstract models to reﬁne the approaches
described in [
], by integrating socio-cognitive, communication and normative components inside the
instantiated internal simulations. Reﬂective agents should be able to also simulate the minds and behaviours of other
agents and organisations in various contexts where different norms are active.
7.5 Meaningful Interaction
If a machine is sufﬁciently endowed with all the previous reﬂective capabilities, then the ultimate capability of such a
reﬂective machine would be to interact meaningfully with humans. In Hamblin’s words, this means to have a machine
‘worth talking to’ [
]. One way (and there are others) we might want to judge if a machine is worth talking to, is by
having a machine truly pass a Turing test. To be able to do this, this machine must be capable of human-like deception.
There have been cases of machines that have successfully deceived judges to pass the Turing test, most notably the
Loebner prize winners. However, the type of deception employed by these machines makes us doubt their intelligence
and reﬂective capabilities. If machines are designed and pre-programmed by humans to pass the Turing test, is the
deception performed by humans or the machine? The Leobner prize winning machines are simple chatterbots, which
follow a pre-deﬁned script given to them by their designers [
]. These machines do not posses the ability to deliberate
in order to decide what to say, how to say it, or when to say it in different circumstances. Such machines are merely
vessels of deception, performed by humans, and incapable of tricking Turing test judges into believing that they are
intelligent on their own. In the case of Large-Language-Models (LLMs), these are the equivalent of ‘stochastic parrots’
On the other hand, instead of applying script-based policies or ‘parroting’ algorithms for tricking humans into believing
they are engaging in meaningful interactions, reﬂective machines might actually be able to use their abstract models
of the world and others to give semantics to their utterances or even to their non-linguistic behaviour, similarly to
the deceptive AI agents proposed in [
], which have internal ‘consequence engines’ that simulate the outcomes of
communicative interactions w.r.t. false beliefs formed in the minds of their interlocutor agents. These sorts of agents
not only have the ability to model other agents behaviourally, as explored in the special issue edited by [
], but have
the ability to use an Artiﬁcial Theory of Mind to reason about consequences of their actions on the minds of others. For
instance, the agents in [
] use a combination of simulated theory of mind (ST) and theory-theory of mind (TT) to
reason about how they can cause changes in the beliefs of others and reason about the consequences of these belief
changes. Similarly, Winﬁeld’s robots use it to predict the actions of other agents and anticipate the likely consequences
of those actions both for themselves and the other agents .
Taking Dennett’s perspective on what language does for Popperian agents, tackling this research challenge also means
enabling the design of AI agents that can use language and arguments to ‘set the stage’ for their consequence engines
and contrast and compare the results of their internal simulations, not just inside their own minds, but together with
humans. Such Popperian AI agents will then be capable of using language to communicate relevant information [
Finally, this would allow humans and reﬂective agents to make sense of the world together. Such a future would mean
that AI would cease to be merely a tool for scientiﬁc discovery and start acting as a scientist by fostering deliberation,
debate, and proposing testable explanations of surrounding phenomena, in line with [
]’s description of explainable
AI, and similar to ’s idea of Transformationally creative AI.
APREPRINT - JAN UARY 27, 2023
Since we mentioned the ability to use language meaningfully for deception, this could also imply the creation of
Gregorian agents that use language to shape the models of their world in various explorative ways. One example of
meaningful interaction between human and AI Gregorian agents would be using their consequence engines to explore
various ﬁctitious models that could serve as world-building, or story-building. [
] even provides an ethical account
for deceptive machines to be used in the scope of entertainment. Gregorian AI agents would go beyond fostering
debate about observable facts in the physical world, but could instead foster exercises of imagination and creativity, and
perhaps, even new philosophical perspectives about our value systems? This latter potential of Gregorian AI agents
touch on Boden’s concept of Transformational AI, but compared to the Popperian agents, Gregorian agents are also
able to evaluate social constructs that emerged from language use.
When agents talk about things in a human-interpretable manner they need to make sense, not talk
nonsense or blabber. First, work on speech-acts and agent communication languages has to be further developed to
enable agents to extract and refer to linguistic semantics from their abstract models of the world [
]. Second, methods
based on natural-language-processing and argumentation [
] need to be developed for agents to be able to perform
abstract conceptualisation from linguistic data. Finally, there is need is to integrate dialogue-based argumentation
] for reﬂective agents to be able to form sound and consistent arguments, and even tell meaningful
stories when interacting with others [
], without resorting to pre-deﬁned scripts [
] or learnt statistical patterns
Much research in AI is concerned with breaking a problem down until its constituent parts are solvable; this is important
work. Conversely, linking these things together again in an agent-centric fashion to create the sorts of complex mind-like
phenomena that motivated us in the ﬁrst place, is just as crucial. As we have sketched above, there is a lot to draw on in
conceiving and building reﬂective AI systems. Yet a lot of research remains in understanding how to put together the
pieces of the puzzle. Some aspects of reﬂection are present in the established agent architectures and argumentation
models for normative reasoning, deliberation, practical reasoning, and communication. After all, reﬂection is a crucial
component of social interaction, cooperation, and reasoning about what others know and how they might act in different
Reﬂective AI will be no silver bullet to the problems raised at the beginning of this paper. Yet it may be a step towards
more deliberate, careful, and trustworthy AI, including in those sorts of contexts, if we want to take it.
Finally, we ought clarify that reﬂective AI would absolutely not be an excuse to avoid doing AI responsibly, nor does
it represent a technical ﬁx to what is fundamentally a social problem. Delegating reﬂective mental capabilities does
not nor cannot obviate human responsibility, nor should it distract from it. For example, when building and deploying
AI systems, sadly too little attention is often paid to making them context-sensitive, to understanding stakeholders
and operational conditions, to requirements analysis, to understanding bias in data and how it might be ampliﬁed, to
transparency about training sets, and to interpretability. What we are proposing here is not an either-or. But it is, we
hope, a step towards a more complete, less unbalanced conceptualisation of AI.
 Margaret A Boden. AI: Its Nature and Future. Oxford University Press, 2016.
Virginia Dignum and Frank Dignum. Agents are dead. Long live agents! In Proceedings of the 19th International
Conference on Autonomous Agents and MultiAgent Systems, pages 1701–1705, 2020.
Ron Sun. Cognitive science meets multi-agent systems: A prolegomenon. Philosophical psychology, 14(1):5–28,
 Antonio Lieto. Cognitive design for artiﬁcial minds. Routledge, 2021.
 Jeremy Pitt, editor. The Computer After Me. Imperial College Press / World Scientiﬁc, 2014.
M Tine. Uncovering a differentiated Theory of Mind in children with autism and Asperger syndrome. PhD thesis,
Boston College, USA, 2009.
Richard Bellman. An introduction to artiﬁcial intelligence: can computers think? Boyd & Fraser, San Francisco,
Dagmar Monett, Colin W. P. Lewis, Kristinn R. Thórisson, Joscha Bach, Gianluca Baldassarre, Giovanni Granato,
Istvan S. N. Berkeley, François Chollet, Matthew Crosby, Henry Shevlin, John Fox, John E. Laird, Shane Legg,
Peter Lindes, Tomáš Mikolov, William J. Rapaport, Raúl Rojas, Marek Rosa, Peter Stone, Richard S. Sutton,
APREPRINT - JAN UARY 27, 2023
Roman V. Yampolskiy, Pei Wang, Roger Schank, Aaron Sloman, and Alan Winﬁeld. Special issue “on deﬁning
artiﬁcial intelligence” – commentaries and author’s response. Journal of Artiﬁcial General Intelligence, 11:1–100,
Peter R. Lewis and Stephen Marsh. What is it like to trust a rock? a functionalist perspective on trust and
trustworthiness in artiﬁcial intelligence. Cognitive Systems Research, 72:33–49, 2021.
Reuters. Amazon ditched AI recruiting tool that favored men for technical jobs. The Guardian., Oct. 11, 2018,
Donald Russell, George Cawkwell, Werner Deuse, John Dillon, Heinz-Günther Nesselrath, Robert Parker,
Christopher Pelling, and Stephan Schröder. On the daimonion of Socrates: Plutarch. SAPERE. Mohr Siebeck
GmbH and Co. KG, 2010.
Plato (translated by Paul Shorey). Plato in Twelve Volumes, volume 5 & 6. Harvard University Press, Cambridge,
MA, USA, 1969.
 Daniel C Dennett. The role of language in intelligence. Walter de Gruyter, 2013.
Germund Hesslow. Conscious thought as simulation of behaviour and perception. Trends in cognitive sciences,
 Germund Hesslow. The current status of the simulation theory of cognition. Brain research, 1428:71–79, 2012.
 Donald A. Schön. The Reﬂective Practitioner: How Professionals Think In Action. Basic Books, 1984.
David. A. Kolb. Experiential learning: Experience as the source of learning and development. Prentice-Hall,
Englewood Cliffs, N.J., USA, 1984.
Stuart Russell and Peter Norvig. Artiﬁcial intelligence: A modern approach, global edition 4th. Foundations,
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
Michael P Georgeff and Amy L Lansky. Reactive reasoning and planning. In AAAI, volume 87, pages 677–682,
John R Anderson, Michael Matessa, and Christian Lebiere. Act-r: A theory of higher level cognition and its
relation to visual attention. Human–Computer Interaction, 12(4):439–462, 1997.
Warren S McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin
of mathematical biophysics, 5(4):115–133, 1943.
Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain.
Psychological review, 65(6):386, 1958.
 Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
Wojciech Samek, Grégoire Montavon, Sebastian Lapuschkin, Christopher J Anders, and Klaus-Robert Müller.
Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE,
Aaron Sloman and Ron Chrisley. Virtual machines and consciousness. Journal of Consciousness Studies,
Aaron Sloman. Virtual machine functionalism: The only form of functionalism worth taking seriously in
philosophy of mind. 2013.
 Aaron Sloman. What is it like to be a rock? 1996.
Brian Cantwell Smith. Reﬂection and semantics in lisp. In Proceedings of the 11th ACM SIGACT-SIGPLAN
symposium on Principles of programming languages, pages 23–35, 1984.
Anand S Rao, Michael P Georgeff, et al. BDI agents: From theory to practice. In ICMAS, volume 95, pages
Brian Cantwell Smith. Procedural reﬂection in programming languages. PhD thesis, Massachusetts Institute of
Lavindra De Silva, Felipe Meneguzzi, and Brian Logan. Bdi agent architectures: A survey. International Joint
Conferences on Artiﬁcial Intelligence, 2020.
Sam Leask and Brian Logan. Programming agent deliberation using procedural reﬂection. Fundamenta Informati-
cae, 158(1-3):93–120, 2018.
APREPRINT - JAN UARY 27, 2023
Daniel C Dennett. Why the law of effect will not go away. Journal for the Theory of Social Behaviour, 5:169–187,
Christopher Landauer and Kirstie L. Bellman. Wrappings for software development. In Proceedings of the
Thirty-First Hawaii International Conference on System Sciences, volume 3, pages 420–429 vol.3, 1998.
Frances Brazier and Jan Treur. Formal speciﬁcation of reﬂective agents. In IJCAI ’95 Workshop on Reﬂection,
Christian Blum, Alan F. T. Winﬁeld, and Verena V. Hafner. Simulation-based internal models for safer robots.
Frontiers in Robotics and AI, 2018.
Peter R. Lewis, Arjun Chandra, Funmilade Faniyi, Kyrre Glette, Tao Chen, Rami Bahsoon, J im Torresen, and Xin
Yao. Architectural aspects of self-aware and self-expressive computing systems. IEEE Computer, 48:62–70, 2015.
Samuel Kounev, Peter Lewis, Kirstie Bellman, Nelly Bencomo, Javier Camara, Ada Diaconescu, Lukas Esterle,
Kurt Geihs, Holger Giese, Sebastian Göetz, Paola Inverardi, Jeffrey Kephart, and Andrea Zisman. The notion of
self-aware computing. In Samuel Kounev, Jeffrey O. Kephart, Aleksandar Milenkoski, and Xiaoyun Zhu, editors,
Self-Aware Computing Systems, pages 3–16. Springer, 2017.
Alain Morin. Levels of consciousness and self-awareness: A comparison and integration of various neurocognitive
views. Consciousness and Cognition, 15:358–71, 2006.
John McCarthy. Making robots conscious of their mental states. In Machine Intelligence 15, Intelligent Agents
[St. Catherine’s College, Oxford, July 1995], page 3–17. Oxford University, 1999.
Melanie Mitchell. Self-awareness and control in decentralized systems. In Metacognition in Computation, pages
Caio A. Lage, De Wet Wolmarans, and Daniel C. Mograbi. An evolutionary view of self-awareness. Behavioural
Processes, 194:104543, 2022.
Peter R. Lewis, Arjun Chandra, Shaun Parsons, Edward Robinson, Kyrre Glette, Rami Bahsoon, Jim Torresen,
and Xin Yao. A Survey of Self-Awareness and Its Application in Computing Systems. In Proceedings of the
International Conference on Self-Adaptive and Self-Organizing Systems Workshops (SASOW), pages 102–107.
IEEE Computer Society, 2011.
Peter R. Lewis, Arjun Chandra, Funmilade Faniyi, Kyrre Glette, Tao Chen, Rami Bahsoon, Jim Torresen, and Xin
Yao. Architectural aspects of self-aware and self-expressive computing systems. IEEE Computer, 48:62–70, 2015.
P. R. Lewis, M. Platzner, B. Rinner, J. Torresen, and X. Yao. Self-Aware Computing Systems: An Engineering
Approach. Springer, 2016.
Ulric Neisser. The roots of self-knowledge: Perceiving self, it, and thou. Annals of the New York Academy of
Science, 818:19–33, 1997.
Peter R. Lewis, Kirstie L. Bellman, Christopher Landauer, Lukas Esterle, Kyrre Glette, Ada Diaconescu, and
Holger Giese. Towards a framework for the levels and aspects of self-aware computing systems. In Samuel
Kounev, Jeffrey O. Kephart, Aleksandar Milenkoski, and Xiaoyun Zhu, editors, Self-Aware Computing Systems,
pages 3–16. Springer, 2017.
K. Bellman, C. Landauer, N. Dutt, L. Esterle, A. Herkersdorf, A. Jantsch, N. TaheriNejad, P. R. Lewis, M. Platzner,
and K. Tammemäe. Self-aware cyber-physical systems. ACM Transactions on Cyber-Physical Systems, 4(4),
Christopher Landauer and Kirstie L. Bellman. Reﬂective systems need models at run time. In Sebastian Götz,
Nelly Bencomo, Kirstie L. Bellman, and Gordon S. Blair, editors, Proceedings of the 11th International Workshop
on Models@run.time co-located with 19th International Conference on Model Driven Engineering Languages and
Systems (MODELS 2016), Saint Malo, France, October 4, 2016, volume 1742 of CEUR Workshop Proceedings,
pages 52–59. CEUR-WS.org, 2016.
Kirstie L. Bellman, Christopher Landauer, Phyllis Nelson, Nelly Bencomo, Sebastian Götz, Peter R. Lewis, and
Lukas Esterle. Self-modeling and self-awareness. In Samuel Kounev, Jeffrey O. Kephart, Aleksandar Milenkoski,
and Xiaoyun Zhu, editors, Self-Aware Computing Systems, pages 3–16. Springer, 2017.
Abdessalam Elhabbash, Rami Bahsoon, Peter Tino, Peter R Lewis, and Yehia Elkhatib. Attaining meta-self-
awareness through assessment of quality-of-knowledge. In 2021 IEEE International Conference on Web Services
(ICWS), pages 712–723. IEEE Computer Society, 2021.
Danny Weyns, M Usman Iftikhar, and Joakim Söderlund. Do external feedback loops improve the design of
self-adaptive systems? a controlled experiment. In 2013 8th International Symposium on Software Engineering
for Adaptive and Self-Managing Systems (SEAMS), pages 3–12. IEEE, 2013.
APREPRINT - JAN UARY 27, 2023
Roberta Calegari, Giovanni Ciatto, and Andrea Omicini. On the integration of symbolic and sub-symbolic
techniques for XAI: A survey. Intelligenza Artiﬁciale, 14(1):7–32, 2020.
Mitchell A Potter and Kenneth A De Jong. A cooperative coevolutionary approach to function optimization. In
International Conference on Parallel Problem Solving from Nature, pages 249–257. Springer, 1994.
Shuo Wang, Georg Nebehay, Lukas Esterle, Kristian Nymoen, , and Leandro L. Minku. Common techniques for
self-awareness and self-expression. In Peter R. Lewis, Marco Platzner, Bernhard Rinner, Jim Tørresen, and Xin
Yao, editors, Self-Aware Computing Systems: An Engineering Approach, pages 113–142. Springer, 2016.
Simon T Powers, Anikó Ekárt, and Peter R Lewis. Modelling enduring institutions: The complementarity of
evolutionary and agent-based approaches. Cognitive Systems Research, 52:67–81, 2018.
Frances MT Brazier and Jan Treur. Compositional modelling of reﬂective agents. International Journal of
Human-Computer Studies, 50(5):407–431, 1999.
Ana-Maria Olte¸teanu, Mikkel Schöttner, and Arpit Bahety. Towards a multi-level exploration of human and
computational re-representation in uniﬁed cognitive frameworks. Frontiers in psychology, 10:940, 2019.
 James H. Moor. Four kinds of ethical robots. Philosophy Now, 72:12–14, 2009.
Alan F. Winﬁeld, Katina Michael, Jeremy Pitt, and Vanessa Evers. Machine ethics: The design and governance of
ethical ai and autonomous systems [scanning the issue]. Proceedings of the IEEE, 107(3):509–517, 2019.
José-Antonio Cervantes, Sonia López, Luis-Felipe Rodríguez, Salvador Cervantes, Francisco Cervantes, and Félix
Ramos. Artiﬁcial moral agents: A survey of the current status. Science and Engineering Ethics, 26(2):501–532,
Alan FT Winﬁeld and Marina Jirotka. Ethical governance is essential to building trust in robotics and artiﬁcial
intelligence systems. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering
Sciences, 376(2133):20180085, 2018.
Dieter Vanderelst and Alan F. T. Winﬁeld. The dark side of ethical robots. In AAAI/ACM Conference on AI Ethics
and Society, pages 317–322, 2018.
Corinne Cath. Governing artiﬁcial intelligence: ethical, legal and technical opportunities and challenges. Philosoph-
ical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(2133):20180080,
Katie Atkinson, Trevor Bench-Capon, and Peter McBurney. Multi-agent argumentation for edemocracy. In
EUMAS, pages 35–46, 2005.
Katie Atkinson and Trevor Bench-Capon. Practical reasoning as presumptive argumentation using action based
alternating transition systems. Artiﬁcial Intelligence, 171(10-15):855–874, 2007.
Katie Atkinson and Trevor Bench-Capon. States, goals and values: Revisiting practical reasoning. Argument &
Computation, 7(2-3):135–154, 2016.
Katie Atkinson and Trevor Bench-Capon. Value-based argumentation. Journal of Applied Logics, 8(6):1543–1588,
Elizabeth I Sklar, Mohammad Q Azhar, Simon Parsons, and Todd Flyr. A case for argumentation to enable
human-robot collaboration. Proceedings of Autonomous Agents and Multiagent Systems (AAMAS), St Paul, MN,
Douglas Walton, Christopher Reed, and Fabrizio Macagno. Argumentation schemes. Cambridge University Press,
Francesca Mosca, ¸Stefan Sarkadi, Jose M Such, and Peter McBurney. Agent EXPRI: Licence to explain. In
International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems, pages 21–38.
Francesca Mosca and Jose Such. Elvira: an explainable agent for value and utility-driven multiuser privacy. In
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021.
¸Stefan Sarkadi, Alison R Panisson, Rafael H Bordini, Peter McBurney, and Simon Parsons. Towards an approach
for modelling uncertain theory of mind in multi-agent systems. In International Conference on Agreement
Technologies, pages 3–17. Springer, 2018.
Peter McBurney and Michael Luck. The agents are all busy doing stuff! IEEE Intelligent Systems, 22(4):6–7,
APREPRINT - JAN UARY 27, 2023
Louise A Dennis and Nir Oren. Explaining bdi agent behaviour through dialogue. In Proc. of the 20th Interna-
tional Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021). International Foundation for
Autonomous Agents and Multiagent Systems (IFAAMAS), 2021.
Natalia Criado, Estefania Argente, and V Botti. Open issues for normative multi-agent systems. AI communications,
Stephen Craneﬁeld and Bastin Tony Roy Savarimuthu. Normative multi-agent systems and human-robot interaction.
In Workshop on Robot Behavior Adaptation to Human Social Norms (TSAR), 2021.
 Sevan Gregory Ficici. Solution Concepts in Coevolutionary Algorithms. PhD thesis, Brandeis University, 2004.
Carles Sierra, Nick R Jennings, Pablo Noriega, and Simon Parsons. A framework for argumentation-based
negotiation. In International Workshop on Agent Theories, Architectures, and Languages, pages 177–192.
Carles Sierra, Juan Antonio Rodriguez-Aguilar, Pablo Noriega, Marc Esteva, and Josep Lluis Arcos. Engineering
multi-agent systems as electronic institutions. European Journal for the Informatics Professional, 4(4):33–39,
Jeremy Pitt, Julia Schaumeier, and Alexander Artikis. Axiomatization of socio-economic principles for self-
organizing institutions: Concepts, experiments and challenges. ACM Transactions on Autonomous and Adaptive
Systems (TAAS), 7(4):1–39, 2012.
Natalia Criado, Estefania Argente, and V Botti. A bdi architecture for normative decision making. In Proceedings
of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1, pages
 Natalia Criado. Using norms to control open multi-agent systems. AI Communications, 26(3):317–318, 2013.
Phillip Staines. Linguistics and the Parts of the Mind: Or how to Build a Machine Worth Talking to. Cambridge
Scholars Publishing, 2018.
Michael L Mauldin. Chatterbots, tinymuds, and the Turing test: Entering the Loebner prize competition. In AAAI,
volume 94, pages 16–21, 1994.
Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of
stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness,
Accountability, and Transparency, pages 610–623, 2021.
 Stefan Sarkadi. Deception. PhD thesis, King’s College London, 2021.
Stefano V Albrecht and Peter Stone. Autonomous agents modelling other agents: A comprehensive survey and
open problems. Artiﬁcial Intelligence, 258:66–95, 2018.
Alan FT Winﬁeld. Experiments in artiﬁcial theory of mind: From safety to story-telling. Frontiers in Robotics
and AI, 5:75, 2018.
 Deirdre Wilson and Dan Sperber. Meaning and relevance. Cambridge University Press, 2012.
Tim Miller. Explanation in artiﬁcial intelligence: Insights from the social sciences. Artiﬁcial intelligence,
 Margaret A Boden. Creativity and artiﬁcial intelligence. Artiﬁcial intelligence, 103(1-2):347–356, 1998.
Mark Coeckelbergh. How to describe and evaluate “deception” phenomena: recasting the metaphysics, ethics,
and politics of icts in terms of magic and performance and taking a relational and narrative turn. Ethics and
Information Technology, 20(2):71–85, 2018.
Philip R Cohen and Hector J Levesque. Communicative actions for artiﬁcial agents. In ICMAS, volume 95, pages
65–72. AAAI, 1995.
Elena Cabrio and Serena Villata. Natural language arguments: A combined approach. In ECAI 2012, pages
205–210. IOS Press, 2012.
 John Lawrence and Chris Reed. Argument mining: A survey. Computational Linguistics, 45(4):765–818, 2020.
Stefan Sarkadi, Peter McBurney, and Simon Parsons. Deceptive storytelling in artiﬁcial dialogue games. In
Proceedings of the AAAI 2019 Spring Symposium Series on Story-Enabled Intelligence, 2019.
Roger C Schank and Robert P Abelson. Scripts, plans, and knowledge. In IJCAI, volume 75, pages 151–157,