PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Artificial Intelligence (AI) is about making computers that do the sorts of things that minds can do, and as we progress towards this goal, we tend to increasingly delegate human tasks to machines. However, AI systems usually do these tasks with an unusual imbalance of insight and understanding: new, deeper insights are present, yet many important qualities that a human mind would have previously brought to the activity are utterly absent. Therefore, it is crucial to ask which features of minds have we replicated, which are missing, and if that matters. One core feature that humans bring to tasks, when dealing with the ambiguity, emergent knowledge, and social context presented by the world, is reflection. Yet this capability is utterly missing from current mainstream AI. In this paper we ask what reflective AI might look like. Then, drawing on notions of reflection in complex systems, cognitive science, and agents, we sketch an architecture for reflective AI agents, and highlight ways forward.
Content may be subject to copyright.
Peter R. Lewis
Faculty of Business and Information Technology
Ontario Tech
Ontario, Canada
,tefan Sarkadi
Dept. of Informatics
King’s College London
London, UK
January 27, 2023
Artificial Intelligence (AI) is about making computers that do the sorts of things that minds can do, and
as we progress towards this goal, we tend to increasingly delegate human tasks to machines. However,
AI systems usually do these tasks with an unusual imbalance of insight and understanding: new,
deeper insights are present, yet many important qualities that a human mind would have previously
brought to the activity are utterly absent. Therefore, it is crucial to ask which features of minds have
we replicated, which are missing, and if that matters. One core feature that humans bring to tasks,
when dealing with the ambiguity, emergent knowledge, and social context presented by the world, is
reflection. Yet this capability is utterly missing from current mainstream AI. In this paper we ask what
reflective AI might look like. Then, drawing on notions of reflection in complex systems, cognitive
science, and agents, we sketch an architecture for reflective AI agents, and highlight ways forward.
1 Introduction
Margaret Boden has described artificial intelligence as being about making ‘computers that do the sorts of things that
minds can do’ [
, p1]. One strength of this definition lies in the fact that it does not start from an arbitrary description of
the things that might be necessary or sufficient for a system to ‘count’ as AI, but it encourages us to ask: what are the
sorts of things that our minds do? Further, a curious mind is then tempted to ask: could we replicate these things? If so,
how? If not, why not and does that matter?
The definition also implies that there are things that human minds currently do, that in the future computers might do
instead. This is not only true now, but has always been the case throughout human history, and likely will be far into
the future. It is this transference of activity that gives rise to the seemingly constant stream of examples of new AI
technologies’. These mostly do things that human minds used to, or wished to do. This is, of course, also the source of
many of the issues and benefits that arise from the creation and use of AI technology: as we figure out how to replicate
some of the things that minds can do, we delegate these things to machines. This typically brings increased automation,
scale, and efficiency, which themselves contain the seeds of both enormous potential social and economic benefit, and
potential real danger and strife.
Further, we can notice that these AI technologies usually do this with an unusual (im)balance of insight and understand-
ing. New, deeper insight and understanding often arise from the models employed, while many of the ‘qualities’ that a
human mind would have previously brought to the activity, are utterly absent.
In designing and analysing embodied AI technologies, the concept of an intelligent agent is central, and necessitates
descriptions that are abstracted from the natural intelligences they are inspired by or modelled on. This abstraction, in
turn, means that the notion of an AI agent only partially captures the mental, cognitive, and physical features of natural
intelligence. Hence, it is important to ask: are the features that we have included sufficient for what is needed? Are we
satisfied with leaving out those which we did?
Frank and Virginia Dignum have recently reminded us how powerful the concept of an agent is in AI [
]. They have also
pointed out some of the paradigmatic failures of agent-based modelling (ABM) and multi-agent systems (MAS). ABM
arXiv:2301.10823v1 [cs.AI] 25 Jan 2023
methodologies aim to describe a large and complex system of agent populations by using analysis tools in the form
of agent-based simulation. MAS methodologies, instead, focus on the operational side of interacting systems, where
agents operate to create changes in their environment. However, neither methodology is fit for designing human-like
agent architectures. The focus of their discussion [
] is to propose a social MAS architecture and argue that future
socially-aware AI architectures should be different from today’s common utility- and goal- driven models. A similar
proposal was made by Ron Sun years ago, namely that agent methodologies need Cognitive Science and vice-versa [
to address complex socially-aware AI and be able to design such architectures. Antonio Lieto refreshes this proposal [
In this paper we continue this line of thought, sketching an agent architecture that captures some reflective capabilities,
based on cognitive theories. Due to the complex and modular nature of reflection it is impossible to find a single unique
and crisp clear definition of the term reflection [
]. Reducing the definition to a single process or component of an
architecture would fail to address the richness of this cognitive process and would be counter-productive in explaining
how all of the processes and components at play interact for human-like reflection to happen. Thus, in order to do it
justice, similarly to [6], we adopt a differentiated theory approach to convey the notion of reflection.
2 Playing Chess Isn’t Just About Chess
So what are the sorts of things minds do? As Richard Bellman suggested in 1978 [
], these include activities such
as ‘decision-making, problem solving, learning...’ And the sheer quantity of research on machines that can do these
activities is astounding. Yet as Bellman’s ellipsis suggests, this is clearly an incomplete list. Perhaps any such list would
be. It might be more useful to think situationally. We can ask: which features of our minds do we bring to different
activities? Let us explore a thought experiment using a canonical example: chess. When playing chess, we largely
bring the ability to reason, to plan ahead, to use heuristics, and to remember and recall sequences of moves, such as the
caro-kann defence. Against an anonymous opponent on the Internet, we might try to use these abilities as best we can.
When playing chess with a child, however, we might typically bring a few more features too: patience, empathy (for
example to understand the child’s current mental model of the game to help coach them), and also some compassion,
since proficient chess players could likely beat most children every time and make it less interesting all round. Letting
children win is also not helpful, but a parent might play out a different sequence of moves to open up more in-game
experiences from time to time. As a young player grows up, benefiting from both more brain development and
experience at chess, and finds joy in different parts of the game, the way an adult opponent might do this will change. A
good teacher might think back over previous games, reflect on the changes in the child’s understanding and reasoning,
and responses to moves. They might use this to speculate on and mentally play out possible future games.
This chess example illustrates three points: (i) even playing chess is not just about problem solving; (ii) rather
unsurprisingly, our mental features are rich, contextual, and flexible; and (iii) we reflect on our situations, our current
and past behaviour in them, and the likely outcome of those behaviours including the impact on others, in order to
choose which mental features to engage. This is not just about flexible behaviour selection, it’s about which mechanisms
which of Boden’s sorts of things even kick in. What can this teach us about how we might want to build AI systems?
Returning to the idea that we are delegating mental activity to machines, it tells us that perhaps we might want to have a
similar ability in AI agents.
3 The Dangers of Incomplete Minds
Many people tie themselves in knots trying to define ‘intelligence’, hoping that that will lead us to somewhat of a more
complete (and, they often say, more helpful) definition of ‘artificial intelligence’. One example of such a discussion can
be found in a recent special issue of the Journal of Artificial General Intelligence [
]. As pointed out by Sloman in that
collection, much of this definitional wrangling misses the point, at least from the perspective of deciding when we want
to accept a computer to replace part of the activity previously done in society by human minds. Better questions might
be to ask: what can this thing do; and what is it for? Consider: if we are deciding to put a machine in a position where it
is carrying out a task in a way that we are satisfied is equivalent to what previously only a human mind could do, we
have admitted something about the nature of either the task, or the machine, or our minds. Perhaps an AI system is
simply a machine that operates sufficiently similarly to our mind, at least in some situations, that we are prepared to
accept the machine operating in lieu of us. So this leads us to ask when and why we would be prepared to accept this.
Or perhaps, given most AI systems (and minds) cannot be not fully understood or controlled, when and why we would
be prepared to trust it to do so [9].
In one recent example, the seemingly harmless act of allowing a ‘smart’ voice assistant to propose entertainment
activities to a child led to a life-threatening suggestion from a supposed trusted AI
. Normally, when delegating the
proposal of children’s play activities, we would expect that the person we had delegated that to would have not only a
decent dose of common sense, but also the ability to consider the potential consequences of any ideas that sprung to
mind before vocalizing them.
In another now well-known example, Amazon’s automated recruiting tool, trained on data from previous hiring
decisions, discriminated based on gender for technical jobs [
]. Here, the delegation is from professional recruiters and
hiring managers to a computer that replicates (some of) the mental activity they used to do. The aims are automation,
scale, and efficiency. That such a sexist system was put into practice at all is at the very least unfortunate and negligent.
It is also tempting to argue that these are ‘just bad apples’, and that better regulation is the answer. It may be. But what
is particularly interesting in our context is that people hiring managers, shareholders, applicants trusted the system
to do something that, previously, a human mind did. But unlike the mind of the professional it replaced, it had no way
of reflecting on the social or ethical consequences, or on the virtue or social value of its actions, or even if its actions
were congruent with prevailing norms or values. That it had no way of reflecting on this meant that it also stood no
chance at stopping or correcting itself. Indeed, neither of the above AI systems even had the mental machinery to do
such a thing this part of the mental activity is, as yet, nowhere near delegated. This leads to an unusual divorce of
accompanying mental qualities that would normally work in concert. No wonder the behaviour might seem a little
pathological. As humans, a core part of our intelligence is our ability to reflect in these ways; reflection is a core mental
mechanism that we use to evaluate ourselves. The existence of this form of self-awareness and self-regulation can be
key to why others may find us trustworthy. Could we expect the same of machines?
4 The Role of Reflection in Driving Human Behaviour
One aspect of reflection is captured by what Socrates called his daemon [
], something that checked him ‘from any act
opposed to his true moral and intellectual interests’ [
]. Socrates saw this as a divine signal, not proposing action,
but monitoring it, and intervening if necessary. If such a check were based on morals or ethics, we might call this a
conscience. If it were based on broader goals than simply the immediate (for example, choosing a chess move to make
against your daughter), we might call this considering the bigger picture. Essentially, this is a process that notices
what we are thinking, what we are considering doing, and allows and explores the thought, but can prevent the action.
It decides whether to do this by contextualising the action. Contexts, as alluded to above, might be ethical, cultural,
political, social, or based on non-immediate (higher-level, longer-term, or not immediately visible) goals.
What Socrates presents here requires a ‘Popperian’ mind according to Dennett’s Tower of Generate and Test [
Essentially, in what he describes as a framework for ‘design options for brains’, Dennett notes that (at the bottom of
the Tower) the testing of hypotheses is done by Darwinian evolution: hypotheses are generated through mutations and
the placing of novel organisms in the world and tested through their survival. Above this, Skinnerian creatures test
hypotheses by taking actions and learning in an operant conditioning fashion, based on environmental feedback within
their lifetime. Higher still are Popperian and Gregorian creatures, which have the mental capability to bring hypothesis
testing internally to their mind, rather than requiring it to be done in the world. Both of these operate with forms of
reflection: put simply, Popperian creatures think about what to think, and Gregorian creatures, using tools, language,
and culture, extend this to think about how to think.
One plausible way these Popperian and Gregorian creatures’ minds might work is Hesslow’s Simulation Theory of
Cognition [
]. Hesslow’s hypothesis is that there exists a mechanism in the brain that helps agents reason about
the consequences of their actions in an environment by simulating the stimuli of their behaviour in that environment,
without having this behaviour previously reinforced by actual stimuli generated by their past behaviour. For example,
this mechanism allows an agent to think about the deadly consequences of driving towards a concrete wall at high speed
without having done it beforehand.
(author?) [16]
compares the inherent nature of reflection in professional practice with a purely technically rational
approach that might be characterised by up-front specification and subsequent problem solving. As opposed to
passive problem solving, active experimentation is emphasised by how professional practitioners deal ‘on-the-fly’ with
uncertainty, ambiguity and emergent knowledge inherent in tasks. From a technical rationality perspective,
argues, ‘professional practice is a process of problem solving’, yet, ‘in real-world practice, problems do not present
themselves to the practitioner as givens. They must be constructed from the materials of problematic situations which
are puzzling, troubling, and uncertain.’ This means that the sorts of things that (to continue the example) a professional
recruiter does is not simply problem solving in a defined setting: there are patterns and mechanical aspects to their
work, but the problem is always somewhat uncertain, and emerges from practice and the setting. Thus, we arrive at
what Schön describes as ‘an epistemology of practice which places technical problem solving within a broader context
of reflective inquiry.
Figure 1: Kolb’s ‘Experiential Learning Model’. Source: [
]. The model shows captures the cognitive cycle in humans
that is responsible for learning from experience.
A model that captures reflection in practice, that is both exploratory and governed by a sense of the bigger picture
and the principles that govern our intended direction, is Kolb’s learning cycle [
]. His ‘Experiential Learning Model’
comprises four phases: i) having a concrete experience, ii) an observation and subjective evaluation of that experience
in context, iii) the formation of abstract concepts based upon the evaluation, and iv) the formation of an intention to test
the new concepts, leading to further experience.
5 Where are we now?
Reflection in humans is complex and comprises numerous related phenomena. This makes it extremely difficult, if not
impossible to find a single, crisp, and clear definition. This is usually the case with complex socio-cognitive phenomena
(author?) [5]
). Our approach instead is to contribute to building a ‘differentiated theory’, as is often done in social
psychology [
]. This allows us to collect and compare the different ways in which phenomena all commonly referred to
as being part of ‘reflection’ interact. In doing so, we aim to build towards a socio-cognitive theory of reflection in AI.
Let us first examine the current state of AI, in this light.
Figure 2: Critic Agent Architecture [
]. We introduce this architecture as a baseline AI architecture that manages to
capture various aspects of perceiving, learning, planning, reasoning and acting as different qualitative processes. One
can visually contrast this architecture with with the other mainstream architectures, old and new, in AI.
Artificial Neural Network Architectures (ANN)
Initially introduced by
(author?) [22]
and in the form of a simple
perceptron by
(author?) [23]
, ANNs have gained major traction in the AI community. ANNs are highly applicable in
the domain of statistical machine learning in which they are trained to perform various tasks, and outperform humans in
quite a few of these tasks [
]. However, like any supervised learning model, ANN’s over-reliance on historical data
Backpropagation Feed-forward
Figure 3: ANN architecture (left, by Colin Burnett from
) and
]. Considering these common machine learning architectures, it is clear that there is a lack of any reflective
‘loop’. Although these achieve different outcomes, they are qualitatively equivalent in the sense that they both operate
at a single level of abstraction when it comes to information processing. There is no self-reference: the loops in both
cases are for feedback, in much the same way that the Critic Agent operates. Additionally, even though Kolb’s model of
experiential learning is a model of learning in humans, it also presents (albeit at a high level) qualitative processes that
ANNs and GANs do not.
means that they learn to repeat what has been done, not what ought to be done. Coupled with their largely black-box
nature, this leads to a propagation of existing systemic biases that is difficult to identify or address. Post-hoc methods to
interpret and ‘explain’ ANN-based models, such as LIME and SHAP [
], result not in an explanation of the internal
mechanics of ANNs, but approximations in the form of equivalent interpretable models. For instance, in order to explain
a deep-ANN, a decision tree or a heat-map are generated as an approximate function between the inputs and the outputs
of the deep-ANN. This may, perhaps, be seen as a form of external, open loop reflection (but typically by others, not by
the system itself); in and of themselves, the architecture of (feed-forward) ANNs has no capability for reflection.
Generative Adversarial Network Architectures (GAN)
GANs [
] are one recent example of how ANNs can be
used as building blocks within an explicitly designed architecture. These pitch two multi-layered perceptrons (ANNs)
against each other in a 2-player minimax game. The higher-level architecture here captures the sort of competitive
creative co-adaptation found within co-evolutionary systems.
When it comes to the human ability of reflection, GANs by themselves are incapable of representing the process. While
their architecture contains a feedback loop, it does not operate at the meta level: the architecture is ‘flat’. It is not
generally considered that a GAN (or coevolution in general) adds any type of high-level cognitive process. While
ANNs in general are just clusters of interconnected nodes with weighted edges, and the same may be said of the brain,
we contend that there are essentially two approaches to generating cognitive processes of this type: one is a complex
systems approach, where the virtual machine [
] operationalizing cognition emerges through complexity; the
second is through an explicit architecture, as we do in this paper. Because AI agents that solely use ANNs such as these
cannot reflect about themselves and the consequences of their actions in the world, they can behave anti-socially with
no ability to know this.
Practical Reasoning Architectures
Regarding the difference between reflection and deliberation employed in
practical reasoning, deliberation is a process for thinking out decisions, whereas reflection is a higher-level process
that situates the agent that performs deliberation in a context through abstract conceptualisation. Deliberation does not
require self-representation through abstract conceptualisation, because deliberation can be done at symbol-level, e.g.,
implementing deliberation strategies and selecting them using a procedural reflection process (a rather reductive notion
of reflection). Procedural reflection is a meta-reasoning in PRS - the Procedural Reasoning System (based on Lisp [
] -
see Fig. 4) - on which later BDI models [
], Fig. 6, were based). The PRS/BDI type of reflection is passing symbols
from the previous state (or lower symbolic level) to the current one such that we can say what the system was up to in
the previous state [
]. There is no nested reasoning, or self-representation, both which are are crucial for many forms
of reflection.
Figure 4: PRS architecture (Source: [
]). The PRS architecture is similar to the Critic agent architecture in the sense
that it allows us to break down different qualitative processes. The difference between the Critic architecture and the
PRS is that the PRS does not include a learning component, but it has a richer representation of the processes and
elements responsible with driving the reasoning behind the actions that are executed in the environment. PRS also
allows for an eventual learning component to be plugged into the system interface which feeds data into the belief base.
BDI Architectures
Even the use of procedural reflection has only been recently done using more recent architectures
such as BDI. According to a recent survey on BDI agents [
], only one paper was identified that implemented reflection
in BDI architectures. In that work [
], BDI agents can use system-wide instructions to identify the context in which
they operate and this enables them to use a rather reductive notion of reflection called procedural reflection to select
deliberation strategies [33].
Regarding the distinction between active experimentation and exploration, active experimentation is also a multifaceted
process, that includes at least exploration and active learning at the meta-level, and also intentional reconceptualizations
of existing knowledge, i.e. Dennett’s Tower of Generate-and-Test [
], in order to be able to reflect on the value of
new models.
Domain Expert Systems
These, such as tutoring expert systems, do not replicate reflection beyond in a rudimentary
sense either. For advanced domain expert tutoring systems that are based on architectures like ACT-R [
] and that
implement some learning theory, they are reflective, but in the procedural sense that we explained above - Lisp style
(see Fig. 5) [
]. Another issue with systems like ACT-R is that they are architectures for domain expert systems where
the environment is part of the system, not agent-based architectures like the critic agent architecture (Fig. 2) where
agents act in an observed environment.
To summarise, the ANN architectures discussed above do not allow for reflection to be captured. Conversely, PRS, BDI,
and ACT-R do not exclude it; neither do they explicitly describe it.
6 Building Reflective AI Agents
In order to make an agent reflective, thus expanding the list of Boden’s ‘sorts of things’, we first need an architecture.
We must separate out reflection from decision making and action.
Module Environment Motor
ACT-R Buffers
memory Declarative
Figure 5: ACT-R architecture [
]. It is crucial to note that ACT-R is not an AI agent architecture, rather a cognitive
architecture that was used as an expert system. The original purpose of ACT-R is to map and understand human
cognition as a set of modular components that execute procedures to produce behaviour in a specific domain. ACT-R
assumes that all cognitive components are represented and driven by declarative and procedural memory.
Second, we need a suite of reflective cognition processes that may be included depending on the form of reflection
desired. A given instance of a reflective agent may have one or more or all of these processes, in line with the
differentiated theory approach. We categorise these processes in four tiers:
Tier 1 Reflective Agent:
This incorporates models of self and others, and a process to reason using these models in
order to ask itself what-if questions concerning generated actions. This enables a Popperian-style consequence engine
and reflective governance process, able to evaluate proposed actions in context (acknowledging that context can change)
and at least block some actions.
Tier 2 Reflective Agent:
Adds processes that learn new reflective models, including incorporating feedback from new
experiences into them incrementally. This addition enables Kolb-style reflective experiential learning.
Tier 3 Reflective Agent:
Adds a reflective reasoning process that proposes not only a single ‘optimal’ solution, but
is ready to present a diversity of possible ways forward hypotheses to be tested based on different approaches to
solving the problem (including safe ways of disengaging from it).
There are many ways an agent may generate proposed actions. These vary in complexity substantially, from simple
randomised search (e.g. mutation or exploration) through to heuristic and guided search approaches, up to potentially
advanced forms of artificial creativity and strategic planning.
These approaches provide the ability to deliberate about novel possible strategies for action, including in new or
potential (imagined) situations, and to evaluate these internally by reasoning with the reflective models.
Tier 4 Reflective Agent:
Adds the ability to re-represent existing learnt models in new ways. This facilitates new
reasoning possibilities and the potential for new insights. It provides a Gregorian-style ability to change the way the
agent reflects.
Third, we need a way of representing the broader context: we need models of prevailing norms, and of the social values
associated with the outcomes of different possible actions, and of other higher-level goals that may not be immediately
or obviously relevant to the task.
Note that these components are mostly not new, but it is their novel combination and integration that provides new
capability. Indeed, there are now several decades of work on reflective architectures, including early work like Landauer
and Bellman’s Wrappings [
], and Brazier and Treur’s [
] specification for agents that can reason reflectively about
Figure 6: BDI architecture (by Jomi F. Hübner from
). The BDI architecture was designed to help AI agent designers build intuitive and
interpretable AI agents capable of practical reasoning. The architecture depicts different qualitative processes and
elements responsible for meta-reasoning (deliberation) and belief revision (BRF), which then help the agent decide
what to do in a given circumstance in order to achieve their goals/desires in a dynamic environment.
information states. More recently, Blum et al’s Consequence Engine architecture [
], the EPiCS architecture [
] and
the LRA-M architecture [39], are all aimed explicitly at achieving computational self-awareness through reflection.
On this broader point, self-awareness, often considered as the capacity to be ‘the object of one’s own attention’ [
has long been targeted as a valuable property for computational systems to possess [
], owing to the value of
its functional role and evolutionary advantage in biological organisms [
]. Computational forms of self-awareness
require reflective processes that access, build, and operate on self-knowledge [
]. This self-knowledge is typically
described according to five ‘levels of self-awareness’ [
] rooted in the work of
(author?) [47]
, although may
consider many other aspects [
]. In some cases these are trivial self-models, for example a smartphone may have an
internal parameter that captures whether its charging port contains moisture. Slightly more complex, the device may
learn an internal model of its typical charging behaviour, sufficiently to act meaningfully on, and this may adapt as the
battery degrades. In more complex examples still, a cyber-physical system may have a model of available resources
discovered at run-time [49].
Learning and reasoning with self-knowledge requires a reflective self-modelling process [
] of the type described
here. The exact form of such learning and self-modelling will vary depending on requirements and situation, but some
examples include self-modelling based on abstraction from run-time data (e.g.,
(author?) [51]
), or simulation of oneself
in the environment [
]. As Blum et al demonstrate, such simulations may be used as ‘consequence engines’,
similarly to how [14] describes the ability of the human brain to execute processes of internal cognitive simulation.
The LRA-M model proposed by
(author?) [39]
(Figure 7), a commonly used reflective architecture that comes from
the area of self-adaptive systems research, captures computational reflection at an abstract level. However, this leaves
unclear several aspects associated with agents e.g., what process generates the actions? Comparing this with a standard
learning-based Critic agent [
], we can see the inverse is true: learning and action selection are present, but reflection
is not (see Figure 2).
Hence, here we propose one way to integrate the architecture of learning agents with the reflective schema captured
by Kounev et al. In this way, a reflective architecture enables information to be abstracted and reasoned with at the
meta-level, feeding back to update goals for learning, and to regulate behaviour.
We motivate our choice to base our architecture on Russell & Norvig’s [
] Critic Agent, and further for using Kounev
et al’s [
] reflective loop for discussing reflection in AI, since they enjoy broad understanding and acceptance in the
domains of agent architecture and computational reflection, respectively. The critic agent, not because it is the best or
most state-of-the-art for any particular domain, but because it allows us to illustrate how reflection can be incorporated
Figure 7: LRA-M Reflective Architecture. Source: [
]. The Learn-Reason-Act-Model (LRA-M) model was designed
as a reference architecture to capture the essential components of computational self-awareness and their relations.
Strictly speaking, acting is considered optional, depicted by the dashed line, though is typically the purpose of the
self-awareness in a practical system. The circular arrows signify that learning and reasoning are ongoing processes at
run-time, based on streaming data from ongoing observations of the world and oneself. Learning and reasoning also
operate on existing internal models, including processes such as re-representation, abstraction, and planning.
into a very widely used and understood standard agent architecture. This, we hope, makes the article and argument more
accessible. While there are also many reflective loops that we could have chosen, Kounev’s is one that enjoys broad
support, particularly from the self-adaptive systems community. Indeed, the article that presents that was the result of a
large community effort at a Dagstuhl Seminar. Thus, while we acknowledge (and hope) that many other architectures
can be paired with other forms of reflective loop, in this article, we use these two as an illustration and first step.
What we can now see is missing is a simple, generic schema for how reflection relates to existing agent architectures
commonly used in modern AI. We propose such an architecture, as a synthesis of Russell and Norvig’s critic agent and
Kounev et al’s LRA-M architecture for reflective computational self-awareness, illustrated in Figure 8.
The advantage of an architectural approach is that it describes a separate set of processes, and we know that building
systems that self-monitor is easier using an ‘external layer’ style [
]. Further note that what we propose is not an
architecture that passes on information from one module to another as is the case in numerous hybrid approaches that
aim to marry symbolic and sub-symbolic models [
]. What we propose is a cognitive architecture for reflection which
can interpret information before passing it from one module to another. The interpretation of information is dynamic
and happens in multiple processes. Below we describe how our proposed architecture ensures information interpretation
at different cognitive levels through process flows.
There are many information flows enabled by the addition of this reflective capability. Here, we sketch some of the most
obvious and perhaps important ones, particularly those that link to the conceptual discussion above. We categorised the
flows according to the corresponding tiers of reflective agents:
Tier 1 Flow Governance:
Figure 8: Proposed Reflective Agent Architecture.
Flow 1: Governing Behaviour:
Actuators Reflective Reasoning Actuators
E.g. intervening to prevent an intended action.
Tier 2 Flows Integrating experience and external factors:
Flow 2: Abstract Conceptualization of Experience
Sensors Reflective Learning Reflective Models Reflective Reasoning Critic.
E.g. Kolbian experiential learning through conceptualization of new and changing experiences.
Flow 3: Learn about and integrate new extrinsic factors into operational goals:
Higher-Level Extrinsic Goals
Reflecting Learning
Reflective Models
Reasoning Critic.
E.g. learning about and integrating new external factors, such as social norms, standards, and new user
preferences, discovered in the environment, such as signs, verbal instructions, and observation of behaviour.
Flow 4: Integrate new design goals into existing reflective models and operational goals:
Higher-Level Extrinsic Goals Reflective models Reflective Reasoning Critic.
E.g. learning about and integrating new external factors, such as social norms, standards, and new user
preferences, discovered in the environment, such as signs and verbal instructions.
Tier 3 Flows Critique and Imagination:
Flow 5: Active Experimentation to Improve Potential Behaviour
Actuators Reflective Reasoning Critic Learning Element Performance Element Actuators
E.g. Using the information that an action was intervened upon in order to adapt what the learning element
learns, and hopefully avoid the situation in future. Or, creatively proposing novel courses of action and testing
hypotheses regarding them.
Flow 6: Reflecting on effectiveness of current operational goals and progress towards them:
Reflective Reasoning Critic Reflective Reasoning.
E.g. Counterfactual reasoning about current and potential goals, the ‘what’ of operational learning; black-box
reasoning about progress towards them, for example asking ‘am I stuck?’ or ’would a different reward
function better serve my high-level goals?’
Flow 7: Reflecting on the current mechanisms of learning:
Reflective Reasoning Learning Element Reflective Reasoning.
E.g. White-box reasoning about current operational learning mechanisms, the ‘how’, for example asking ‘how
am I learning to do this?’ and ‘could I try to learn in a different way?’
Tier 4 Flow Re-Representation:
Flow 8: Reflective Thinking:
Reflective Reasoning Reflective Models Reflective Reasoning
E.g. refactoring and reconciling models, re-representing existing conceptual knowledge, concept synthesis.
One important and powerful insight is that these information flows can be treated as ‘primitives’ and composed to
provide additional and more complex cognitive features. For example, the composition of Flows 2 & 6 could give rise
to curiosity-driven behaviour, while adding Flow 8 to this allows the result of the curiosity to be integrated into existing
knowledge. Similarly, Flows 1 & 8 support reflecting on behaviour governance, for example reconciling competing
imperatives, assessing the effectiveness of an intervention, or deliberating over an action. Adding Flow 6 to this, permits
the deliberation to not only act over potential actions, but over potential directions for future learning.
There are a number of research challenges here. Specifically, there is a need to understand how to operationalize
the above information flows, including operational semantics, APIs, and methods for the semantic transformation of
information from symbolic to sub-symbolic levels and vice-versa.
7 How Do We Get There?
Many of the individual components required to realize Reflective AI already exist. In some cases, the challenge is
to integrate these in a purposeful way to achieve the vision set out above. In other cases, there remain important
fundamental research challenges. In this Section, we outline some of these in key areas.
7.1 Reflective Learning
Reflective Learning lies conceptually at the core of the proposed architecture. Fundamentally, learning here provides
two forms of modelling capabilities: abstraction and simulation, which support reasoning in complementary ways.
Abstraction is key in both learning and re-representation and facilitates reflective reasoning: imagine you have two
cognitive modules with information passed between them (cf.
(author?) [19
), if the agent does not have abstraction
capabilities, then it will keep using the two modules to play ping-pong while the ball remains unchanged as it passes
from bat to bat. However, if it has processes in the architecture that transforms the information, then it is no longer a
game of ping-pong, but instead a dialectic where new understanding emerges.
However, abstract conceptualisation capabilities have not entered mainstream AI research yet. As illustrated in Figure 8,
such a reflective process could start from reflective observation, which takes the data output of the Sensors and passes
them to the reflective layer, where it uses a reflective learning process to transform this data into concepts that can then
populate various new or updated self-models (abstract conceptualisation). Reasoning over these models can lead to
intentional active experimentation, targeted at generating new experiences to observe, thus continuing the cycle.
Simulation models support a further form of reasoning, over consequences [
]. This permits Dennet’s ‘Popperian’
mind, where hypothesis generation and testing can be carried out internally to the cognition of the agent, without
requiring the world. For example, in the style of
(author?) [14]
, an agent may build a simulation model in the form of
a digital twin of itself in its environment. With sufficient interpretability and accompanied by automated reasoning
processes, this may be complemented with an abstract conceptualisation, for example, that provides the understanding
that the simulation contains an evolutionary stable strategy.
Note that neither the architecture nor the concept of reflective learning prescribe a particular learning algorithm. Many
learning techniques can be used. The choice of technique itself is open-ended and can be made to suit the context so
long as it adheres to, we posit, two conditions. First, that it is model-based, such that the process of learning produces a
model that captures some knowledge about the system and its environment. Interpretable models should be favoured, as
the reasoning processes may then operate on these interpretations automatically. Second, that it operates online, such
that it can incrementally build and update these models and be used in an anytime fashion.
(author?) [46]
note that online learning algorithms are one of the key ingredients in achieving computational
self-awareness. They further note that such online learning must be able to deal with concept drift, since both the
system and its environment change.
(author?) [56]
show how existing online learning algorithms can be used for
reflective self-awareness at different levels, but perhaps most importantly, they intentionally do not propose a preferred
online learning paradigm, rather highlighting that empirical results suggest that using different learning techniques
according to context can lead to enhanced performance. Complementary examples are presented in a collection edited
by (author?) [5], who arrives at a similar conclusion.
In the future, given a mechanism for representing concepts [
], an AI agent could use Kolbian abstract conceptualisation
to form new concepts and more meaningful models of itself and others in a shared system. Simultaneously, an agent
could build simulation models of itself in its environment, to enable Popperian hypothesis testing. Both model forms
provide complementary benefits [
] as forms of reflective modelling for meta-reasoning [
], and in different ways,
require the ability to learn models on-the-fly [59].
Research challenge:
There is a need to develop mechanisms that learn human- and machine-interpretable conceptual
and simulation models from empirical data and semantic information in the world, and further, to develop (unsupervised)
methods for this to be done on the fly in a complex environment.
7.2 Reflective Governance
The proposed architecture captures Socrates’s daemon through a governor loop, as mediator between reflective reasoning
and an agent’s actuators. This loop is a process of deliberation at the meta-level. Reflection captures this process and
situates it in a context, i.e., in an agent’s model of the self and others in the world through abstract conceptualisation or
simulation. Thus, the system does not need to re-learn its decision model if something in the set of oughts (higher-level
extrinsic goals) in its situation changes though it might want to, later. It just needs to check the behaviour against
them, and occasionally say ‘no, that’s not appropriate; give me an alternative, try a different approach.’
Regarding the ethical nature of this, explicitly ethical agents are nothing new, at least since
(author?) [60]
proposed a
way of discerning four different ‘types’. Indeed, the question of imbuing artificial agents with ethical values was the
topic of a special issue of Proceedings of the IEEE. Winfield et al’s summary [
] and Cervantes et al’s survey [
provide an introduction. And indeed, Winfield and colleagues provided an early example [
] of putting these kinds of
‘ethical governors’ into robots, as consequence engines [
]; concerns also exist about whether explicit ethical agents
are a good idea [64].
Research challenge:
There is a need to develop inclusive, participatory methods for capturing values, norms, and
preferences in formal, interpretable models that can be translated for use a) in a critic module to drive learning, b) as
part of the behaviour governance process, and c) that respects the diversity of interpretation of human values that exists.
There is a further need to develop governance and learning processes that adopt these in order to generate and ensure
behaviour is aligned with them, as emphasised throughout the Royal Society Special Issue edited by [65].
7.3 Reflective Deliberation
Going deeper still, agents could extend the above with reflective deliberation. Reflective agents can deliberate by using
active experimentation between reflective reasoning and critic (see Figure 8) from time to time to find alternative ways
of approaching problems. When considering finding multiple possible diverse and viable courses of action, we can draw
on the rich and active research activity on dialogues, practical reasoning and value-based argumentation [
These could help us to find new, different solutions, that come at a problem from a novel angle. And when evaluating
these alternatives, we may choose to formulate the very notion of what ‘successful’ means according to our values; and
in adopting these we must acknowledge that the best action may be a compromise. To instantiate this sort of reflection,
agents could employ value-based practical reasoning mechanisms such as action based alternating transition systems
with values (AATS+V) or dialogue protocols [
]. In turn, these are used to build argument schemes [
] which agents
can use for both reflecting on their possible decisions, as well as justifying their decisions by providing explanations
[72, 73].
Research challenge:
Agents need to be able to perform internal simulations of their actions and check the outcomes of
these actions inside their own mind in order to perform deliberation. There is therefore a need to develop semantics and
nested abstract models of the world for agent architectures to enable agents to go beyond the procedural reflection of
BDI and PRS-like systems, by having the capability to run, analyze, and interpret new simulation models on the fly,
according to need. One idea could be to develop polymorphic simulation models, that can be instantiated into specific
simulations based on the learnt concepts and the need.
7.4 Social Context
Mentalistic capabilities, as we have explained in the chess example, play an important role in reflecting about one’s
complex decisions. Again, BDI-like agents can be given both the ability to communicate their decisions to other
agents as well as the ability to model the minds of other agents inside their own cognitive architecture in order to
better coordinate, or even delegate tasks [
]. Social interactions can be modelled and implemented with dialogue
frameworks so that agents can explain and justify their behaviour[
]. Modelling social context is a rich research
field. Formal models of norms can be captured using deontic logic; research in normative systems considers the
capturing of norms in agents [
] and human-robot interactions [
]. Social context also includes social values
represented in Higher-Level Extrinsic Goals.Solution concepts [
] give us one way to formalise these. These
can be directly learned at the reflective layer by the agent through reflective learning. An AI system able to reflect
on its actions in terms of social context would need to draw on formal models such as these. Work on normative
reasoning in open MAS could play a crucial role, ranging from negotiation between individuals to engineering electronic
institutions [80, 81, 82].
Research challenge:
There is a need to develop the semantics and nested abstract models to refine the approaches
described in [
], by integrating socio-cognitive, communication and normative components inside the
instantiated internal simulations. Reflective agents should be able to also simulate the minds and behaviours of other
agents and organisations in various contexts where different norms are active.
7.5 Meaningful Interaction
If a machine is sufficiently endowed with all the previous reflective capabilities, then the ultimate capability of such a
reflective machine would be to interact meaningfully with humans. In Hamblin’s words, this means to have a machine
‘worth talking to’ [
]. One way (and there are others) we might want to judge if a machine is worth talking to, is by
having a machine truly pass a Turing test. To be able to do this, this machine must be capable of human-like deception.
There have been cases of machines that have successfully deceived judges to pass the Turing test, most notably the
Loebner prize winners. However, the type of deception employed by these machines makes us doubt their intelligence
and reflective capabilities. If machines are designed and pre-programmed by humans to pass the Turing test, is the
deception performed by humans or the machine? The Leobner prize winning machines are simple chatterbots, which
follow a pre-defined script given to them by their designers [
]. These machines do not posses the ability to deliberate
in order to decide what to say, how to say it, or when to say it in different circumstances. Such machines are merely
vessels of deception, performed by humans, and incapable of tricking Turing test judges into believing that they are
intelligent on their own. In the case of Large-Language-Models (LLMs), these are the equivalent of ‘stochastic parrots’
On the other hand, instead of applying script-based policies or ‘parroting’ algorithms for tricking humans into believing
they are engaging in meaningful interactions, reflective machines might actually be able to use their abstract models
of the world and others to give semantics to their utterances or even to their non-linguistic behaviour, similarly to
the deceptive AI agents proposed in [
], which have internal ‘consequence engines’ that simulate the outcomes of
communicative interactions w.r.t. false beliefs formed in the minds of their interlocutor agents. These sorts of agents
not only have the ability to model other agents behaviourally, as explored in the special issue edited by [
], but have
the ability to use an Artificial Theory of Mind to reason about consequences of their actions on the minds of others. For
instance, the agents in [
] use a combination of simulated theory of mind (ST) and theory-theory of mind (TT) to
reason about how they can cause changes in the beliefs of others and reason about the consequences of these belief
changes. Similarly, Winfield’s robots use it to predict the actions of other agents and anticipate the likely consequences
of those actions both for themselves and the other agents [90].
Taking Dennett’s perspective on what language does for Popperian agents, tackling this research challenge also means
enabling the design of AI agents that can use language and arguments to ‘set the stage’ for their consequence engines
and contrast and compare the results of their internal simulations, not just inside their own minds, but together with
humans. Such Popperian AI agents will then be capable of using language to communicate relevant information [
Finally, this would allow humans and reflective agents to make sense of the world together. Such a future would mean
that AI would cease to be merely a tool for scientific discovery and start acting as a scientist by fostering deliberation,
debate, and proposing testable explanations of surrounding phenomena, in line with [
]’s description of explainable
AI, and similar to [93]’s idea of Transformationally creative AI.
Since we mentioned the ability to use language meaningfully for deception, this could also imply the creation of
Gregorian agents that use language to shape the models of their world in various explorative ways. One example of
meaningful interaction between human and AI Gregorian agents would be using their consequence engines to explore
various fictitious models that could serve as world-building, or story-building. [
] even provides an ethical account
for deceptive machines to be used in the scope of entertainment. Gregorian AI agents would go beyond fostering
debate about observable facts in the physical world, but could instead foster exercises of imagination and creativity, and
perhaps, even new philosophical perspectives about our value systems? This latter potential of Gregorian AI agents
touch on Boden’s concept of Transformational AI, but compared to the Popperian agents, Gregorian agents are also
able to evaluate social constructs that emerged from language use.
Research challenge:
When agents talk about things in a human-interpretable manner they need to make sense, not talk
nonsense or blabber. First, work on speech-acts and agent communication languages has to be further developed to
enable agents to extract and refer to linguistic semantics from their abstract models of the world [
]. Second, methods
based on natural-language-processing and argumentation [
] need to be developed for agents to be able to perform
abstract conceptualisation from linguistic data. Finally, there is need is to integrate dialogue-based argumentation
frameworks [
] for reflective agents to be able to form sound and consistent arguments, and even tell meaningful
stories when interacting with others [
], without resorting to pre-defined scripts [
] or learnt statistical patterns
(stochastic ‘parroting’).
8 Conclusion
Much research in AI is concerned with breaking a problem down until its constituent parts are solvable; this is important
work. Conversely, linking these things together again in an agent-centric fashion to create the sorts of complex mind-like
phenomena that motivated us in the first place, is just as crucial. As we have sketched above, there is a lot to draw on in
conceiving and building reflective AI systems. Yet a lot of research remains in understanding how to put together the
pieces of the puzzle. Some aspects of reflection are present in the established agent architectures and argumentation
models for normative reasoning, deliberation, practical reasoning, and communication. After all, reflection is a crucial
component of social interaction, cooperation, and reasoning about what others know and how they might act in different
Reflective AI will be no silver bullet to the problems raised at the beginning of this paper. Yet it may be a step towards
more deliberate, careful, and trustworthy AI, including in those sorts of contexts, if we want to take it.
Finally, we ought clarify that reflective AI would absolutely not be an excuse to avoid doing AI responsibly, nor does
it represent a technical fix to what is fundamentally a social problem. Delegating reflective mental capabilities does
not nor cannot obviate human responsibility, nor should it distract from it. For example, when building and deploying
AI systems, sadly too little attention is often paid to making them context-sensitive, to understanding stakeholders
and operational conditions, to requirements analysis, to understanding bias in data and how it might be amplified, to
transparency about training sets, and to interpretability. What we are proposing here is not an either-or. But it is, we
hope, a step towards a more complete, less unbalanced conceptualisation of AI.
[1] Margaret A Boden. AI: Its Nature and Future. Oxford University Press, 2016.
Virginia Dignum and Frank Dignum. Agents are dead. Long live agents! In Proceedings of the 19th International
Conference on Autonomous Agents and MultiAgent Systems, pages 1701–1705, 2020.
Ron Sun. Cognitive science meets multi-agent systems: A prolegomenon. Philosophical psychology, 14(1):5–28,
[4] Antonio Lieto. Cognitive design for artificial minds. Routledge, 2021.
[5] Jeremy Pitt, editor. The Computer After Me. Imperial College Press / World Scientific, 2014.
M Tine. Uncovering a differentiated Theory of Mind in children with autism and Asperger syndrome. PhD thesis,
Boston College, USA, 2009.
Richard Bellman. An introduction to artificial intelligence: can computers think? Boyd & Fraser, San Francisco,
Dagmar Monett, Colin W. P. Lewis, Kristinn R. Thórisson, Joscha Bach, Gianluca Baldassarre, Giovanni Granato,
Istvan S. N. Berkeley, François Chollet, Matthew Crosby, Henry Shevlin, John Fox, John E. Laird, Shane Legg,
Peter Lindes, Tomáš Mikolov, William J. Rapaport, Raúl Rojas, Marek Rosa, Peter Stone, Richard S. Sutton,
Roman V. Yampolskiy, Pei Wang, Roger Schank, Aaron Sloman, and Alan Winfield. Special issue “on defining
artificial intelligence” commentaries and author’s response. Journal of Artificial General Intelligence, 11:1–100,
Peter R. Lewis and Stephen Marsh. What is it like to trust a rock? a functionalist perspective on trust and
trustworthiness in artificial intelligence. Cognitive Systems Research, 72:33–49, 2021.
Reuters. Amazon ditched AI recruiting tool that favored men for technical jobs. The Guardian., Oct. 11, 2018,
Donald Russell, George Cawkwell, Werner Deuse, John Dillon, Heinz-Günther Nesselrath, Robert Parker,
Christopher Pelling, and Stephan Schröder. On the daimonion of Socrates: Plutarch. SAPERE. Mohr Siebeck
GmbH and Co. KG, 2010.
Plato (translated by Paul Shorey). Plato in Twelve Volumes, volume 5 & 6. Harvard University Press, Cambridge,
MA, USA, 1969.
[13] Daniel C Dennett. The role of language in intelligence. Walter de Gruyter, 2013.
Germund Hesslow. Conscious thought as simulation of behaviour and perception. Trends in cognitive sciences,
6(6):242–247, 2002.
[15] Germund Hesslow. The current status of the simulation theory of cognition. Brain research, 1428:71–79, 2012.
[16] Donald A. Schön. The Reflective Practitioner: How Professionals Think In Action. Basic Books, 1984.
David. A. Kolb. Experiential learning: Experience as the source of learning and development. Prentice-Hall,
Englewood Cliffs, N.J., USA, 1984.
Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach, global edition 4th. Foundations,
19:23, 2021.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
Michael P Georgeff and Amy L Lansky. Reactive reasoning and planning. In AAAI, volume 87, pages 677–682,
John R Anderson, Michael Matessa, and Christian Lebiere. Act-r: A theory of higher level cognition and its
relation to visual attention. Human–Computer Interaction, 12(4):439–462, 1997.
Warren S McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin
of mathematical biophysics, 5(4):115–133, 1943.
Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain.
Psychological review, 65(6):386, 1958.
[24] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
Wojciech Samek, Grégoire Montavon, Sebastian Lapuschkin, Christopher J Anders, and Klaus-Robert Müller.
Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE,
109(3):247–278, 2021.
Aaron Sloman and Ron Chrisley. Virtual machines and consciousness. Journal of Consciousness Studies,
10:133–172, 2003.
Aaron Sloman. Virtual machine functionalism: The only form of functionalism worth taking seriously in
philosophy of mind. 2013.
[28] Aaron Sloman. What is it like to be a rock? 1996.
Brian Cantwell Smith. Reflection and semantics in lisp. In Proceedings of the 11th ACM SIGACT-SIGPLAN
symposium on Principles of programming languages, pages 23–35, 1984.
Anand S Rao, Michael P Georgeff, et al. BDI agents: From theory to practice. In ICMAS, volume 95, pages
312–319, 1995.
Brian Cantwell Smith. Procedural reflection in programming languages. PhD thesis, Massachusetts Institute of
Technology, 1982.
Lavindra De Silva, Felipe Meneguzzi, and Brian Logan. Bdi agent architectures: A survey. International Joint
Conferences on Artificial Intelligence, 2020.
Sam Leask and Brian Logan. Programming agent deliberation using procedural reflection. Fundamenta Informati-
cae, 158(1-3):93–120, 2018.
Daniel C Dennett. Why the law of effect will not go away. Journal for the Theory of Social Behaviour, 5:169–187,
Christopher Landauer and Kirstie L. Bellman. Wrappings for software development. In Proceedings of the
Thirty-First Hawaii International Conference on System Sciences, volume 3, pages 420–429 vol.3, 1998.
Frances Brazier and Jan Treur. Formal specification of reflective agents. In IJCAI ’95 Workshop on Reflection,
Christian Blum, Alan F. T. Winfield, and Verena V. Hafner. Simulation-based internal models for safer robots.
Frontiers in Robotics and AI, 2018.
Peter R. Lewis, Arjun Chandra, Funmilade Faniyi, Kyrre Glette, Tao Chen, Rami Bahsoon, J im Torresen, and Xin
Yao. Architectural aspects of self-aware and self-expressive computing systems. IEEE Computer, 48:62–70, 2015.
Samuel Kounev, Peter Lewis, Kirstie Bellman, Nelly Bencomo, Javier Camara, Ada Diaconescu, Lukas Esterle,
Kurt Geihs, Holger Giese, Sebastian Göetz, Paola Inverardi, Jeffrey Kephart, and Andrea Zisman. The notion of
self-aware computing. In Samuel Kounev, Jeffrey O. Kephart, Aleksandar Milenkoski, and Xiaoyun Zhu, editors,
Self-Aware Computing Systems, pages 3–16. Springer, 2017.
Alain Morin. Levels of consciousness and self-awareness: A comparison and integration of various neurocognitive
views. Consciousness and Cognition, 15:358–71, 2006.
John McCarthy. Making robots conscious of their mental states. In Machine Intelligence 15, Intelligent Agents
[St. Catherine’s College, Oxford, July 1995], page 3–17. Oxford University, 1999.
Melanie Mitchell. Self-awareness and control in decentralized systems. In Metacognition in Computation, pages
80–85. 2005.
Caio A. Lage, De Wet Wolmarans, and Daniel C. Mograbi. An evolutionary view of self-awareness. Behavioural
Processes, 194:104543, 2022.
Peter R. Lewis, Arjun Chandra, Shaun Parsons, Edward Robinson, Kyrre Glette, Rami Bahsoon, Jim Torresen,
and Xin Yao. A Survey of Self-Awareness and Its Application in Computing Systems. In Proceedings of the
International Conference on Self-Adaptive and Self-Organizing Systems Workshops (SASOW), pages 102–107.
IEEE Computer Society, 2011.
Peter R. Lewis, Arjun Chandra, Funmilade Faniyi, Kyrre Glette, Tao Chen, Rami Bahsoon, Jim Torresen, and Xin
Yao. Architectural aspects of self-aware and self-expressive computing systems. IEEE Computer, 48:62–70, 2015.
P. R. Lewis, M. Platzner, B. Rinner, J. Torresen, and X. Yao. Self-Aware Computing Systems: An Engineering
Approach. Springer, 2016.
Ulric Neisser. The roots of self-knowledge: Perceiving self, it, and thou. Annals of the New York Academy of
Science, 818:19–33, 1997.
Peter R. Lewis, Kirstie L. Bellman, Christopher Landauer, Lukas Esterle, Kyrre Glette, Ada Diaconescu, and
Holger Giese. Towards a framework for the levels and aspects of self-aware computing systems. In Samuel
Kounev, Jeffrey O. Kephart, Aleksandar Milenkoski, and Xiaoyun Zhu, editors, Self-Aware Computing Systems,
pages 3–16. Springer, 2017.
K. Bellman, C. Landauer, N. Dutt, L. Esterle, A. Herkersdorf, A. Jantsch, N. TaheriNejad, P. R. Lewis, M. Platzner,
and K. Tammemäe. Self-aware cyber-physical systems. ACM Transactions on Cyber-Physical Systems, 4(4),
Christopher Landauer and Kirstie L. Bellman. Reflective systems need models at run time. In Sebastian Götz,
Nelly Bencomo, Kirstie L. Bellman, and Gordon S. Blair, editors, Proceedings of the 11th International Workshop
on Models@run.time co-located with 19th International Conference on Model Driven Engineering Languages and
Systems (MODELS 2016), Saint Malo, France, October 4, 2016, volume 1742 of CEUR Workshop Proceedings,
pages 52–59., 2016.
Kirstie L. Bellman, Christopher Landauer, Phyllis Nelson, Nelly Bencomo, Sebastian Götz, Peter R. Lewis, and
Lukas Esterle. Self-modeling and self-awareness. In Samuel Kounev, Jeffrey O. Kephart, Aleksandar Milenkoski,
and Xiaoyun Zhu, editors, Self-Aware Computing Systems, pages 3–16. Springer, 2017.
Abdessalam Elhabbash, Rami Bahsoon, Peter Tino, Peter R Lewis, and Yehia Elkhatib. Attaining meta-self-
awareness through assessment of quality-of-knowledge. In 2021 IEEE International Conference on Web Services
(ICWS), pages 712–723. IEEE Computer Society, 2021.
Danny Weyns, M Usman Iftikhar, and Joakim Söderlund. Do external feedback loops improve the design of
self-adaptive systems? a controlled experiment. In 2013 8th International Symposium on Software Engineering
for Adaptive and Self-Managing Systems (SEAMS), pages 3–12. IEEE, 2013.
Roberta Calegari, Giovanni Ciatto, and Andrea Omicini. On the integration of symbolic and sub-symbolic
techniques for XAI: A survey. Intelligenza Artificiale, 14(1):7–32, 2020.
Mitchell A Potter and Kenneth A De Jong. A cooperative coevolutionary approach to function optimization. In
International Conference on Parallel Problem Solving from Nature, pages 249–257. Springer, 1994.
Shuo Wang, Georg Nebehay, Lukas Esterle, Kristian Nymoen, , and Leandro L. Minku. Common techniques for
self-awareness and self-expression. In Peter R. Lewis, Marco Platzner, Bernhard Rinner, Jim Tørresen, and Xin
Yao, editors, Self-Aware Computing Systems: An Engineering Approach, pages 113–142. Springer, 2016.
Simon T Powers, Anikó Ekárt, and Peter R Lewis. Modelling enduring institutions: The complementarity of
evolutionary and agent-based approaches. Cognitive Systems Research, 52:67–81, 2018.
Frances MT Brazier and Jan Treur. Compositional modelling of reflective agents. International Journal of
Human-Computer Studies, 50(5):407–431, 1999.
Ana-Maria Olte¸teanu, Mikkel Schöttner, and Arpit Bahety. Towards a multi-level exploration of human and
computational re-representation in unified cognitive frameworks. Frontiers in psychology, 10:940, 2019.
[60] James H. Moor. Four kinds of ethical robots. Philosophy Now, 72:12–14, 2009.
Alan F. Winfield, Katina Michael, Jeremy Pitt, and Vanessa Evers. Machine ethics: The design and governance of
ethical ai and autonomous systems [scanning the issue]. Proceedings of the IEEE, 107(3):509–517, 2019.
José-Antonio Cervantes, Sonia López, Luis-Felipe Rodríguez, Salvador Cervantes, Francisco Cervantes, and Félix
Ramos. Artificial moral agents: A survey of the current status. Science and Engineering Ethics, 26(2):501–532,
Alan FT Winfield and Marina Jirotka. Ethical governance is essential to building trust in robotics and artificial
intelligence systems. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering
Sciences, 376(2133):20180085, 2018.
Dieter Vanderelst and Alan F. T. Winfield. The dark side of ethical robots. In AAAI/ACM Conference on AI Ethics
and Society, pages 317–322, 2018.
Corinne Cath. Governing artificial intelligence: ethical, legal and technical opportunities and challenges. Philosoph-
ical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(2133):20180080,
Katie Atkinson, Trevor Bench-Capon, and Peter McBurney. Multi-agent argumentation for edemocracy. In
EUMAS, pages 35–46, 2005.
Katie Atkinson and Trevor Bench-Capon. Practical reasoning as presumptive argumentation using action based
alternating transition systems. Artificial Intelligence, 171(10-15):855–874, 2007.
Katie Atkinson and Trevor Bench-Capon. States, goals and values: Revisiting practical reasoning. Argument &
Computation, 7(2-3):135–154, 2016.
Katie Atkinson and Trevor Bench-Capon. Value-based argumentation. Journal of Applied Logics, 8(6):1543–1588,
Elizabeth I Sklar, Mohammad Q Azhar, Simon Parsons, and Todd Flyr. A case for argumentation to enable
human-robot collaboration. Proceedings of Autonomous Agents and Multiagent Systems (AAMAS), St Paul, MN,
USA, 2013.
Douglas Walton, Christopher Reed, and Fabrizio Macagno. Argumentation schemes. Cambridge University Press,
Francesca Mosca, ¸Stefan Sarkadi, Jose M Such, and Peter McBurney. Agent EXPRI: Licence to explain. In
International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems, pages 21–38.
Springer, 2020.
Francesca Mosca and Jose Such. Elvira: an explainable agent for value and utility-driven multiuser privacy. In
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021.
¸Stefan Sarkadi, Alison R Panisson, Rafael H Bordini, Peter McBurney, and Simon Parsons. Towards an approach
for modelling uncertain theory of mind in multi-agent systems. In International Conference on Agreement
Technologies, pages 3–17. Springer, 2018.
Peter McBurney and Michael Luck. The agents are all busy doing stuff! IEEE Intelligent Systems, 22(4):6–7,
Louise A Dennis and Nir Oren. Explaining bdi agent behaviour through dialogue. In Proc. of the 20th Interna-
tional Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021). International Foundation for
Autonomous Agents and Multiagent Systems (IFAAMAS), 2021.
Natalia Criado, Estefania Argente, and V Botti. Open issues for normative multi-agent systems. AI communications,
24(3):233–264, 2011.
Stephen Cranefield and Bastin Tony Roy Savarimuthu. Normative multi-agent systems and human-robot interaction.
In Workshop on Robot Behavior Adaptation to Human Social Norms (TSAR), 2021.
[79] Sevan Gregory Ficici. Solution Concepts in Coevolutionary Algorithms. PhD thesis, Brandeis University, 2004.
Carles Sierra, Nick R Jennings, Pablo Noriega, and Simon Parsons. A framework for argumentation-based
negotiation. In International Workshop on Agent Theories, Architectures, and Languages, pages 177–192.
Springer, 1997.
Carles Sierra, Juan Antonio Rodriguez-Aguilar, Pablo Noriega, Marc Esteva, and Josep Lluis Arcos. Engineering
multi-agent systems as electronic institutions. European Journal for the Informatics Professional, 4(4):33–39,
Jeremy Pitt, Julia Schaumeier, and Alexander Artikis. Axiomatization of socio-economic principles for self-
organizing institutions: Concepts, experiments and challenges. ACM Transactions on Autonomous and Adaptive
Systems (TAAS), 7(4):1–39, 2012.
Natalia Criado, Estefania Argente, and V Botti. A bdi architecture for normative decision making. In Proceedings
of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1, pages
1383–1384, 2010.
[84] Natalia Criado. Using norms to control open multi-agent systems. AI Communications, 26(3):317–318, 2013.
Phillip Staines. Linguistics and the Parts of the Mind: Or how to Build a Machine Worth Talking to. Cambridge
Scholars Publishing, 2018.
Michael L Mauldin. Chatterbots, tinymuds, and the Turing test: Entering the Loebner prize competition. In AAAI,
volume 94, pages 16–21, 1994.
Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of
stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness,
Accountability, and Transparency, pages 610–623, 2021.
[88] Stefan Sarkadi. Deception. PhD thesis, King’s College London, 2021.
Stefano V Albrecht and Peter Stone. Autonomous agents modelling other agents: A comprehensive survey and
open problems. Artificial Intelligence, 258:66–95, 2018.
Alan FT Winfield. Experiments in artificial theory of mind: From safety to story-telling. Frontiers in Robotics
and AI, 5:75, 2018.
[91] Deirdre Wilson and Dan Sperber. Meaning and relevance. Cambridge University Press, 2012.
Tim Miller. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence,
267:1–38, 2019.
[93] Margaret A Boden. Creativity and artificial intelligence. Artificial intelligence, 103(1-2):347–356, 1998.
Mark Coeckelbergh. How to describe and evaluate “deception” phenomena: recasting the metaphysics, ethics,
and politics of icts in terms of magic and performance and taking a relational and narrative turn. Ethics and
Information Technology, 20(2):71–85, 2018.
Philip R Cohen and Hector J Levesque. Communicative actions for artificial agents. In ICMAS, volume 95, pages
65–72. AAAI, 1995.
Elena Cabrio and Serena Villata. Natural language arguments: A combined approach. In ECAI 2012, pages
205–210. IOS Press, 2012.
[97] John Lawrence and Chris Reed. Argument mining: A survey. Computational Linguistics, 45(4):765–818, 2020.
Stefan Sarkadi, Peter McBurney, and Simon Parsons. Deceptive storytelling in artificial dialogue games. In
Proceedings of the AAAI 2019 Spring Symposium Series on Story-Enabled Intelligence, 2019.
Roger C Schank and Robert P Abelson. Scripts, plans, and knowledge. In IJCAI, volume 75, pages 151–157,
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The trustworthiness (or otherwise) of AI has been much in discussion of late, not least because of the recent publication of the EU Guidelines for Trustworthy AI. Discussions range from how we might make people trust AI to AI being not possible to trust, with many points inbetween. In this article, we question whether or not these discussions somewhat miss the point, which is that people are going ahead and basically doing their own thing anyway, and that we should probably help them. Acknowledging that trust is a heuristic that is widely used by humans in a range of situations, we lean on the literature concerning how humans make trust decisions, to arrive at a general model of how people might consider trust in AI (and other artefacts) for specific purposes in a human world. We then use a series of thought experiments and observations of trust and trustworthiness, to illustrate the use of the model in taking a functionalist perspective on trust decisions, including with machines. Our hope is that this forms a useful basis upon which to develop intelligent systems in a way that considers how and when people may trust them, and in doing so empowers people to make better trust decisions about AI.
Full-text available
With the broader and highly successful usage of machine learning (ML) in industry and the sciences, there has been a growing demand for explainable artificial intelligence (XAI). Interpretability and explanation methods for gaining a better understanding of the problem-solving abilities and strategies of nonlinear ML, in particular, deep neural networks, are, therefore, receiving increased attention. In this work, we aim to: 1) provide a timely overview of this active emerging field, with a focus on “ post hoc ” explanations, and explain its theoretical foundations; 2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations; 3) outline best practice aspects, i.e., how to best include interpretation methods into the standard usage of ML; and 4) demonstrate successful usage of XAI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of ML.
Full-text available
Cognitive Design for Artificial Minds explains the crucial role that human cognition research plays in the design and realization of artificial intelligence systems, illustrating the steps necessary for the design of artificial models of cognition. It bridges the gap between the theoretical, experimental, and technological issues addressed in the context of AI of cognitive inspiration and computational cognitive science. Beginning with an overview of the historical, methodological, and technical issues in the field of cognitively inspired artificial intelligence, Lieto illustrates how the cognitive design approach has an important role to play in the development of intelligent AI technologies and plausible computational models of cognition. Introducing a unique perspective that draws upon Cybernetics and early AI principles, Lieto emphasizes the need for an equivalence between cognitive processes and implemented AI procedures, in order to realize biologically and cognitively inspired artificial minds. He also introduces the Minimal Cognitive Grid, a pragmatic method to rank the different degrees of biological and cognitive accuracy of artificial systems in order to project and predict their explanatory power with respect to the natural systems taken as a source of inspiration. Providing a comprehensive overview of cognitive design principles in constructing artificial minds, this text will be essential reading for students and researchers of artificial intelligence and cognitive science.
Conference Paper
Full-text available
Online social networks fail to support users to adequately share co-owned content, which leads to privacy violations. Scholars proposed collaborative mechanisms to support users, but they did not satisfy one or more requirements needed according to empirical evidence in this domain, such as explainability, role-agnosticism, adaptability, and being utility-and value-driven. We present ELVIRA, an agent that supports multiuser privacy, whose design meets all these requirements. By considering the sharing preferences and the moral values of users, ELVIRA identifies the optimal sharing policy. Furthermore , ELVIRA justifies the optimality of the solution through explanations based on argumentation. We prove via simulations that ELVIRA provides solutions with the best trade-off between individual utility and value adherence. We also show through a user study that ELVIRA suggests solutions that are more acceptable than existing approaches and that its explanations are also more satisfactory.
Conference Paper
Full-text available
Online social networks are known to lack adequate multiuser privacy support. In this paper we present EXPRI, an agent architecture that aims to assist users in managing multiuser privacy conflicts. By considering the personal utility of sharing content and the individually preferred moral values of each user involved in the conflict, EXPRI identifies the best collaborative solution by applying practical reasoning techniques. Such techniques provide the agent with the cognitive process that is necessary for explainability. Furthermore, the knowledge gathered during the practical reasoning process allows EXPRI to engage in contrastive explanations.
The capacity to be self-aware is regarded as a fundamental difference between humans and other species. However, growing evidence challenges this notion, indicating that many animals show complex signs and behaviors that are consonant with self-awareness. In this review, we suggest that many animals are indeed selfaware, but that the complexity of this process differs among species. We discuss this topic by addressing several different questions regarding self-awareness: what is selfawareness, how has self-awareness been studied experimentally, which species may be self-aware, what are its potential adaptive advantages. We conclude by proposing alternative models for the emergence of self-awareness in relation to species evolutionary paths, indicating future research questions to advance this field further.
The more intelligent systems based on sub-symbolic techniques pervade our everyday lives, the less human can understand them. This is why symbolic approaches are getting more and more attention in the general effort to make AI interpretable, explainable, and trustable. Understanding the current state of the art of AI techniques integrating symbolic and sub-symbolic approaches is then of paramount importance, nowadays-in particular in the XAI perspective. This is why this paper provides an overview of the main symbolic/sub-symbolic integration techniques, focussing in particular on those targeting explainable AI systems.
In this paper we make the case for the new class of Self-aware Cyber-physical Systems. By bringing together the two established fields of cyber-physical systems and self-aware computing, we aim at creating systems with strongly increased yet managed autonomy, which is a main requirement for many emerging and future applications and technologies. Self-aware cyber-physical systems are situated in a physical environment and constrained in their resources, they understand their own state and environment and, based on that understanding, are able to make decisions autonomously at runtime in a self-explanatory way. In an attempt to lay out a research agenda, we bring up and elaborate on five key challenges for future self-aware cyber-physical systems: (i) How can we build resourcesensitive yet self-aware systems? (ii) How to acknowledge situatedness and subjectivity? (iii) What are effective infrastructures for implementing self-awareness processes? (iv) How can we verify self-aware cyber-physical systems and, in particular, which guarantees can we give? (v) What novel development processes will be required to engineer self-aware cyber-physical systems?We review each of these challenges in some detail and emphasize that addressing all of them requires the system to make a comprehensive assessment of the situation and a continual introspection of its own state, in order to sensibly balance diverse requirements, constraints, short-term and long-term objectives. Throughout, we draw on three examples of cyber-physical systems that may benefit from self-awareness: a multi-processor system-on-chip, a Mars rover, and an implanted insulin pump. These three very different systems nevertheless have similar characteristics: limited resources, complex unforeseeable environmental dynamics, high expectations on their reliability, and substantial levels of risk associated with malfunctioning. Using these examples, we discuss the potential role of self-awareness in both highly complex and rather more simple systems, and as a main conclusion we highlight the need for research on above listed topics.