Towards an Objective Test of Machine Sentience.
This paper discusses the notion of sentience in machines, and proposes an
approach to analyze it objectively.
It draws insight from studies in Affective Neuroscience which map Functional
neuroimaging data on a subject’s brain activity, to their emotional states.
It then outlines a procedure to obtain useful information about possible
sentience in a given machine/AI model. It also situates this discussion in the
context of pertinent philosophical debates.
It is hoped that this research inspires more work aimed towards structuring an
objective test of sentience in machines.
1. Machine Intelligence:
Machine Intelligence is generally understood as the capability of machines to
demonstrate intelligent behaviour1. The definition of intelligence and what is
considered to be “intelligent behaviour”, is definitely one that changes over
time - there’s also the general tendency to perceive a task as not requiring
“real intelligence” once machines become unarguably competent at it2.
Current technology in the field of Artificial Intelligence demonstrates
remarkable competence at tasks which only humans used to be capable of.
Some of today’s technology has even surpassed the capability of human groups
at tasks such as visual object detection and identification 3,4.
The Turing Test1is one of the most notable tests of machine intelligence. It
denotes a machine as being intelligent if a person interacting with it via a text
conversation is unable to distinguish its responses from that of a human. Recent
Natural Language Processing (NLP) models greatly succeed at exhibiting
behaviour indistinguishable from humans– Passing the Turing test in flying
colours. A recent example is Google’s language model LaMDA- a Google
engineer was so convinced- not just of its intelligence but of its sentience, that
he voiced his opinions/concerns online and provided a transcript of their
conversation5. This sparked significant discussion about AI and sentience in
Beyond intelligence in machines, recent events have sparked conversations
involving sentience - Can these machines feel? Are they alive? Are they
If a machine demonstrates remarkable reasoning ability, understanding of life
and a sense of personal identity, such questions about sentience are expected.
In a sense, they are the next logical step of humans conceptualizing their
interactions with machines. There however does not exist an objective
scientific framework within which such a discussion can be structured. Existing
discussions generally rely on subjective human-judgement tests6, which today’s
AI systems are becoming increasingly capable of passing.
What does sentience mean in the context of machines? How is it to be
detected? How are claims to be refuted?
2. Machine Sentience:
Sentience in a general context, is defined as the capacity to experience
sensations and feelings7. With regard to machines, sentience could be defined
as the capacity of a machine to experience feelings/emotion.
Machines equipped with input devices/sensors are able to perceive (or “feel”)
their environment. For example a machine equipped with a camera can “see”,
and one equipped with a microphone can “hear”. Such a machine can then
inform its subsequent behaviour on information obtained from its environment.
Here however, we define sentience as transcending relatively-straightforward
sensory perception, and as involving the domain of emotion.
Lavelle6defines “Artificial Sentience” as the capacity of a machine to feel as a
human does, in the contexts of sensation, emotion and consciousness. Lavelle’s
Principle of Artificial Sentience defines:
1. “Strong Sentience” as the capability of a machine to feel exactly like a
human with regard to sensation, emotion and consciousness.
2. “Weak Sentience” as the capability of a machine to feel approximately
like a human with regard to sensation, emotion and consciousness.
Of the three components of sentience outlined by Lavelle, consciousness is the
least tractable. It is a subject of considerable controversy, and a number of
different varied and conflicting perspectives exist on what consciousness is and
how it can be detected8,9,10,11. We will therefore restrict the discussion in this
paper to the notion of emotion, as it offers an opportunity to discuss the
metaphysical associations of sentience without being burdened by the
ambiguity of more contentious concepts such as consciousness.
Emotion here will be discussed in a binary manner- as either being present or
absent, and not along a weak-strong spectrum as done by Lavelle.
2.1 A Philosophical Discussion of Emotion:
The nature of interaction between the mind and the body has been a
longstanding topic of discussion and debate in philosophy12.
Are the mind and the body two independent entities, or are they essentially
derivatives of the same thing? Does the mind control the physical body? Is the
mind simply a part of the physical body? Do the physical body and the physical
world even exist?
The different perspectives that exist on the “Mind-Body problem” can be
broadly categorized into three schools of thought13:
1. Materialist: The world is completely physical, and the mind is simply a
product of physical phenomena.
2. Idealist: The world and our physical experience of it, only exist for
certain as mental concepts in the mind. Thus physical concepts are
3. Dualist: The mind and the body both exist as well-defined concepts in
their own right.
Different variants of dualism posit alternate theories on how these two
entities interact with each other.
The ideas in this paper are the most closely aligned with the following
perspectives on the Mind-Body problem:
Property Dualism13:The mind and the body are two distinct concepts, but the
mind is a property of the body. The mind is an emergent phenomenon of the
Nonreductive Physicalism (Of the Materialist school of thought)13: The mind is
essentially a physical entity - however it cannot be completely reduced to
The two above views, despite belonging to two different schools of thought
(Dualist and Materialist), are similar in the sense that they view the mind and
mental states as essentially stemming from the body and the physical.
With regard to the nature of interaction between the mind and the body, this
paper focuses specifically on detectability: The position that a metaphysical
notion like emotion can be detected and analyzed by studying brain physiology.
3. A Test for (Alleged) Machine Sentience:
The general notion of machine sentience (and even sentience in general) is
subjective, and possible tests will depend on the precise
definition/interpretation being employed. Here however, we discuss one such
test based on the outlined definition.
This test for machine sentience involves verifying a machine’s claim of
experiencing a particular emotion. For example, a machine could claim to feel
“happy” at a given point in time. Google’s language model LaMDA claimed to
feel happiness and sadness5.
Testing the veracity of that claim would shed light on whether there actually
does exist an emotional experience underlying the literary/verbal output of the
machine, or if such claims are simply facile statements resulting from purely
Given that a machine says it feels happy at a particular point in time, how
then do you verify that claim?
3.1 Decoding Emotional State from Neural Activity, in Biological Subjects.
The field of Affective Neuroscience provides insight on neural markers of
emotion in biological subjects.
According to Kragel et al14, Functional neuroimaging provides a way to
understand how mental representations are encoded in observed brain activity.
Data from functional magnetic resonance imaging (fMRI) can be used to
decipher the contents of working memory and mental imagery, as well as
distinguish between cognitive processes such as decision making and response
inhibition, amongst others 14-19.
In addition to unveiling ongoing cognitive activity, functional neuroimaging can
be used to decipher emotional states - analyzing brain scans can provide
information on the emotional state of the subject14. Kragel et al provide visual
illustrations (in Figure 1A of their paper) showing the different brain regions in
which neural activity was found to correspond with various emotional states.
Each emotional state was shown to activate a distinct pattern of neural activity
across different brain regions14:
In the context of Machine Sentience, the question then is: How is information
on Emotional quotient/state in machines to be observed/decoded?
3.2 Visualizing Neural Activations in Artificial Neural Networks:
Yosinski et al20 provide a means to visualize the activations produced in each
layer of a trained Convolutional Neural Network. Their tool DeepVis20 makes it
possible to visualize and gain some understanding of the neural activity in a
deep neural network, while it is fed with a specific input.
Figure 1 of their study provides illustrations visualizing the activations at a
specific layer of the AlexNet4network (which was trained to classify images),
while it was provided with specific input images. The visualizations shed light
on the activations of each of the specialized neurons in the given layer20:
By making some correspondence between the input image and the neuron
activations, one could deduce what image features the specialized neurons in
the depicted layer of the AlexNet network have learned to detect (without
having been explicitly trained to do so). All the network was trained on was to
correctly classify input images. Any abstract features learnt by neurons in the
deeper layers of the network were done independently of human
For example, Figure 2 of their study illustrates the activations of a specialized
neuron which has automatically learned to detect faces in images (in a
species-agnostic manner). That specific neuron learned to identify and detect
the notion of a face, without ever being given any specific hints about the
existence of such a concept20:
4. Uncovering Evidence of an Emotional Quotient in Machines.
A technique similar to that mentioned above, could be applied towards
identifying specialized neuron/neuron groups which have somehow learned to
represent emergent concepts such as emotional states, without having been
trained on such concepts.
It makes sense to think that given the recent release of AI models with
increasingly unprecedented complexity, it could happen that highly-abstract
notions like emotion could automatically emerge within them.
How exactly is this to be done?
First we put forward a postulate:
4.1 Postulate: Neural Similitude
- Biological Case:
Members of the same species exhibit similar patterns of neural activity
corresponding to a given emotional state.
Functional neuroimaging studies have shown that general patterns in
brain activity corresponding to cognitive and emotional states tend to
apply across multiple subjects 21,22.
- Machine Case:
This is an extension of the biological case, to machines:
Given Artificially Intelligent models with the capacity for emotion,
models with comparable architectures will have similar patterns of
neural activity corresponding to a given emotional state.
We know for a fact (or at least we assume) that humans are generally capable
of experiencing emotion. Thus these neuroimaging studies aim neither to
confirm nor refute this fact. Instead they aim to understand the
correspondence between fMRI data on brain activity, and these emotional
In machines however, the capacity for emotion is not known for a fact. Thus,
we’re applying these functional neuroimaging studies in a context which is
slightly different from that for which they’re designed. Neuroimaging studies as
outlined, do not offer a way to prove definitively that a subject is capable of
What they do offer however are:
4.1.1. A means to correspond neural activity with hypothetical/alleged
4.1.2. A means to derive population-wide patterns in neural activity
corresponding to various hypothetical/alleged emotional states.
4.2 Proving the Lack of a Capacity for Emotion, in a Machine
In association with Postulate 4.1, we can posit that if there do not exist any
well-defined population-wide patterns as mentioned in 4.1.2, then the machine
subjects under consideration can be denoted as not having the capacity for
emotion. This is because subjects with the capacity for emotion tend to exhibit
Neural similitude, as emphasized by Postulate 4.1.
4.3 Obtaining Information on the Capacity for Emotion in A Machine
According to Postulate 4.1, we can describe what we expect to observe IF the
machines being studied do possess the capacity for emotion:
Analysis of neural activations in comparable models “experiencing” a given
emotion, should yield well-defined population-wide patterns in neural activity
which can be associated with that emotional state.
If there actually do exist such patterns, does that prove sentience?
No it does not.
4.3.1 These shared patterns in neural activity of the machines could be
understood as “What would be observed IF they could feel emotion”.
A means to definitely prove that machines do feel emotion, is still to be
designed. However the hope is that this procedure yields valuable insight and
information nonetheless, on the topic of sentience in machines.
4.4 Computational Questions/Concerns:
- How identical exactly do model architectures have to be for the resulting
models to exhibit shared patterns in neural activity? How much variation
in architecture is permitted between these models?
- Given models with identical architecture- Should they have been trained
on disjoint datasets? Does this require having to train multiple model
instances from scratch? Isn’t that associated with prohibitive
5. On the Possible Phenomenal Experience of Emotion in Machines:
Searle’s Chinese Room Experiment23 is a notable philosophical treatment of
phenomenal experience in machines. It discusses a hypothetical computer
equipped with a program which enables it simulate an understanding of the
Chinese langauge. When prompted with a question in Chinese, the computer
replies with an appropriate response, also in the Chinese language.
Searle asks if this computer understands Chinese in the sense that human
Chinese speakers do. It’s essentially a question about whether the computer
understands Chinese in a phenomenal sense.
Searle argues that in spite of its perceived competence, the computer does not
understand Chinese in a phenomenal sense23. His primary premise is that a
computer (even one with “strong AI”) is only capable of manipulating syntax,
and that only an actual mind (which arises from a brain) has the ability to
conceptualize semantics. A computer would see the word “Apple” as a
sequence of meaningless symbols, but a human would correspond the word
“Apple” to the physical object that it depicts.
Searle’s argument applies effectively to a wide range of machines. However,
recent advances in AI Deep Neural Networks have produced models which are
able to not just manipulate syntax, but also conceptualize underlying semantics
Recent AI language models have been shown to demonstrate a semantic
appreciation of literary text. When provided with volumes of text, these
models learned (automatically) to construct semantic relationships between
the words present in the text.
Mikolov et al24 describe their Word2Vec neural network which when trained on
a volume of text, will automatically- without human guidance, come to an
“understanding” of the semantic relationships between the words in the given
The model learns to represent semantic content and relationships between
words, in geometric terms. The words are stored in the network as vectors,
which can be visually interpreted as geometric points in multidimensional
Euclidean space. For example, in Figure 1 of their study, “cat" is
(approximately) [-0.28, -0.26], corresponding to a geometric point which is
located -0.28 units along the x-axis and -0.26 units along the y-axis (in a
two-dimensional representation of the multidimensional space of word
Words that denote similar concepts are clustered together in the vector space.
As shown in the mentioned Figure 1, all of the other word vectors in the
vicinity of “cat”, denote animals - “horse”, “cow”, etc.
The model can also be used to identify the “odd one out” in a list of words24.
This is done by obtaining the word vectors of each word, and then identifying
the vector which is the most distant from the others. For example a list of
[“cat”, “dog”, “pineapple”, “horse”] would have word vectors distributed
similarly to the image below:
The point representing “pineapple” is seen to be much further out from the
In Mikolov et al’s model25, relationships between words are encoded as
geometric/algebraic operations on their word vectors. For example, carrying
out algebraic operations on the corresponding word vectors gives:
Paris - France + Italy = Rome.
This illustrates that the model automatically learnt to coneptualize the notion
of “a country’s capital”, just from processing the text in the training data. This
is clearly similar to how a human reading through a piece of text could infer
implicit relationships between the concepts in the text.
The paper also mentions other examples of implicit relationships captured by
the model, e.g Einstein - scientist + Mozart = violinist.
The technology described in Mikolov et al’s papers24,25 represents the general
direction that neural network design has taken in the past ten years. These AI
systems actually do demonstrate an “understanding” of the semantic
implication of words in a given text and thus are outside the purview of
Searle’s premises. These AI langauge models have exceeded the capabilities of
even Searle’s “strong AI”.
Thus in the context of Deep Neural Networks, Searle’s Chinese Room
experiment does not apply. Such philosophical treatments of phenomenal
experience in machines will have to update their definition of AI to account for
the increased capability of today’s AI models.
6. Further Work:
Recent discussions of machine sentience have been in the context of language
models. There exist tools to visualize the inner workings of Language Models
based on Input Salience26, Neuron activations27 and Hidden state activations28.
Further work in this direction could involve exploring/modifying those tools to
provide information on the existence of an emotional quotient in such models.
They could then be used to analyze opensource language models of a similar
complexity as those which are the subject in sentience debates. Such language
models include Meta AI’s OPT29, Eleuther AI’s GPT-NeoX30, and BigScience’s
This paper applies insight drawn from Affective Neuroscience studies, towards
obtaining useful information about possible sentience in a machine. It also
situates this discussion in the context of pertinent philosophical debates.
The outlined procedure involves analyzing the neural activity of models with
identical architectures, towards making statements on the possibility of
sentience in those models.
I’m appreciative of the Reddit users philosophy_theology,
RelativeCheesecake10, poly_panopticon, _fidel_castro_, and Nameless1995 of
the Reddit community r/askphilosophy who were immensely helpful in
providing me with needed philosophical perspective and critique on this
1. Turing, A. (1950). “Computing Machinery and Intelligence”. Mind, LIX
(236): 433-460. Retrieved from
2. McCorduck, P. (2004). Machines Who Think (2nd Edition). p. 423. AK
3. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers:
Surpassing human-level performance on ImageNet classification.
Proceedings of the IEEE International Conference on Computer Vision
(pp. 1026–1034). Retrieved from https://arxiv.org/abs/1502.01852
4. Krizhevsky, A., Sutskever, I., Hinton, G.E. (2017). ImageNet Classification
with Deep Convolutional Neural Networks. Communications of the ACM.
60 (6): 84–90. Retrieved from
5. Lemoine, B. (2022). Is LaMDA sentient? An interview. Medium. Retrieved
6. Lavelle, S. (2020). The Machine with a Human Face: From Artificial
Intelligence to Artificial Sentience. In: Dupuy-Chessa, S., Proper, H. (eds)
Advanced Information Systems Engineering Workshops. CAiSE 2020.
Lecture Notes in Business Information Processing, vol 382. Springer,
Cham. Retrieved from
7. Merriam-Webster Dictionary (2022). Sentience. Retrieved from
8. Vimal, RLP; Sansthana, DA (2010). "On the Quest of Defining
Consciousness". Mind and Matter.8(1): 93–121. Retrieved from
9. Tononi, G. (2004). An information integration theory of consciousness.
BMC Neurosci 5, 42 . https://doi.org/10.1186/1471-2202-5-42. Retrieved
10. Merker, B., Williford, K., & Rudrauf, D. (2022). The integrated
information theory of consciousness: A case of mistaken identity.
Behavioral and Brain Sciences, 45, E41. doi:10.1017/S0140525X21000881.
11. Aaronson, Scott (2014). "Why I Am Not An Integrated Information
Theorist (or, The Unconscious Expander)". Shetl-Optimized: The Blog of
Scott Aaronson. Retrieved from
12. Crane, Tim; Patterson, Sarah (2001). "Introduction". History of the
13. Robinson, Howard, "Dualism", The Stanford Encyclopedia of
Philosophy (Fall 2020 Edition), Edward N. Zalta (ed.). Retrieved from:
14. Kragel, P.A., Knodt, A.R., Hariri, A.R., LaBar, K.S. (2016). Decoding
Spontaneous Emotional States in the Human Brain. PLoS Biology. 14(9):
e2000106. Retrieved from
15. Harrison SA, Tong F. (2009) Decoding reveals the contents of visual
working memory in early visual areas. Nature. 458(7238):632–5.
10.1038/nature07832 [PMC free article]
16. Lewis-Peacock JA, Postle BR. (2008). Temporary activation of
long-term memory supports working memory. J Neurosci.
2008;28(35):8765–71. 10.1523/JNEUROSCI.1953-08.2008 [PMC free
17. Stokes M, Thompson R, Cusack R, Duncan J. (2009). Top-down
activation of shape-specific population codes in visual cortex during
mental imagery. J Neurosci. 2009;29(5):1565–72. [PMC free article]
18. Reddy L, Tsuchiya N, Serre T. (2010). Reading the mind's eye: decoding
category information during mental imagery. Neuroimage.
2010;50(2):818–25. [PMC free article]
19. Poldrack RA, Halchenko YO, Hanson SJ. (2009). Decoding the
large-scale structure of brain function by classifying mental States across
individuals. Psychol Sci. 2009;20(11):1364–72.[PMC free article]
20. Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H. (2015).
“Understanding Neural Networks through Deep Visualization”. Deep
Learning Workshop, 31st International Conference on Machine Learning.
Retrieved from https://arxiv.org/abs/1506.06579
21. Mitchell, T.M., Hutchinson, R., Niculescu, R.S., Pereira, F., Wang, X.,
Just, M., Newman, S. (2004). Learning to Decode Cognitive States from
Brain Images. Machine Learning. 57, 145-175. Retrieved from
22. Kassam, K.S, Markey, A.R., Cherkassky, V.L., Loewenstein, G., Just,
M.A. (2013). Identifying Emotions on the Basis of Neural Activation. PLoS
ONE 8(6): e66032. https://doi.org/10.1371/journal.pone.0066032
23. Searle, John (1980). "Minds, Brains and Programs", Behavioral and
Brain Sciences,3(3): 417–457. Retrieved from
24. Mikolov, T., Le, Q.V, Sutskever, I. (2013). “Exploring Similarities Across
Langauges for Machine Translation”. arXiv. Retrieved from
25. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). “Efficient
Estimation of Word Representation in Vector Space”. arXiv. Retrieved
26. Li, J.1, Chen, X., Hovy, E., Jurafsky D. (2015). Visualizing and
Understanding Neural Models in NLP. In Proceedings of the 2016
Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, pages
27. Karpathy, A., Johnson, J., Fei-Fei, L. (2015). Visualizing and
Understanding Recurrent Networks. International Conference of Learning
Representations. Retrieved from https://arxiv.org/abs/1506.02078
28. Voita, E., Sennrich, R., Titov I. (2019). The Bottom-up Evolution of
Representations in the Transformer: A Study with Machine Translation and
Language Modeling Objectives. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing and the 9th
International Joint Conference on Natural Language Processing
(EMNLP-IJCNLP), pages 4396–4406. Association for Computational
29. Zhang, S., Roller, S., Goyal, N. (2022). OPT - Open Pre-trained
Transformer Language Models. ArXiv. Retrieved from
30. Black, S., Biderman, S., Hallahan, E. (2022). GPT-NeoX-20B: An
Open-Source Autoregressive Language Model. Association for
Computational Linguistics. Retrieved from