ArticlePDF Available

The audio Uncanny Valley: Sound, fear and the horror game.


Abstract and Figures

The 1970 proposition that there is an Uncanny Valley which man-made characters inhabit as their human-likeness (both appearance and movement) increases has been a growing topic of debate in the fields of robotics, animation and computer games particularly since the turn of the century. However, what the theory and subsequent related writings do not account for is the role of sound in creating perceptions of uncanniness and fear, a particularly useful attribute in computer game genres such as survival horror. This paper has a dual purpose: to explore diverse writings on the uncanny as they relate to sound and to prepare the groundwork for future work investigating the possible relationship between sound and the Uncanny Valley. The paper comprises, in large part, a survey of selected works on the uncanny and the Uncanny Valley from a variety of disciplines. It emphasizes the link between uncanniness and negative emotions, such as fear and apprehension, and discusses the genesis of the term uncanny in early psychoanalytical writings, relating this to more modern theories on human emotion. Writings on the uncanny, or related emotional states, from psychoacoustics, textiles research, films and computer games are assessed as to their validity and potential application to the fostering of an aural climate of fear in computer games and, where such writings do not explicitly deal with sound, attempts are made to apply the ideas contained within to sound as it exists within computer games. In dealing with the theory of the Uncanny Valley, the paper points out the theory‟s focus on appearance and movement to the exclusion of sound and suggests that there is an uncanny in sound that might, in future, be used to modify the Uncanny Valley theory. Throughout, there is the suggestion that the uncanny (and any future theory of an audio or audiovisual Uncanny Valley) can be harnassed to the design of horror computer games. Ultimately, it is hoped, such work will be of use to computer game sound designers who wish to create a greater perception of fear and apprehension through the canny use of uncanny sound. Some of the design tips presented at the end of the discussion are already used instinctively by sound designers across a range of media, including computer games, whereas others are less obvious in their origin and affect. Recently published empirical data is provided to strengthen the case for the latter. In some cases, the design tips must await the coming of procedural audio to computer games.
Content may be subject to copyright.
University of Bolton
UBIR: University of Bolton Institutional Repository
Games Computing and Creative Technologies:
Conference Papers (Peer-Reviewed)
School of Games Computing and Creative
The audio Uncanny Valley: Sound, fear and the
horror game.
Mark Grimshaw
University of Bolton,
This Conference Paper is brought to you for free and open access by the School of Games Computing and Creative Technologies at UBIR: University
of Bolton Institutional Repository. It has been accepted for inclusion in Games Computing and Creative Technologies: Conference Papers (Peer-
Reviewed) by an authorized administrator of UBIR: University of Bolton Institutional Repository. For more information, please contact
Digital Commons Citation
Grimshaw, Mark. "The audio Uncanny Valley: Sound, fear and the horror game.." (2009). Games Computing and Creative
Technologies: Conference Papers (Peer-Reviewed). Paper 9.
- 1 -
The audio Uncanny Valley: Sound, fear and the horror game
Mark Grimshaw
School of Games Computing & Creative Technologies
University of Bolton
Abstract. The 1970 proposition that there is an Uncanny Valley which man-made characters inhabit as their human-likeness (both
appearance and movement) increases has been a growing topic of debate in the fields of robotics, animation and computer games
particularly since the turn of the century. However, what the theory and subsequent related writings do not account for is the role of
sound in creating perceptions of uncanniness and fear, a particularly useful attribute in computer game genres such as survival horror.
This paper has a dual purpose: to explore diverse writings on the uncanny as they relate to sound and to prepare the groundwork for
future work investigating the possible relationship between sound and the Uncanny Valley.
The paper comprises, in large part, a survey of selected works on the uncanny and the Uncanny Valley from a variety of disciplines.
It emphasizes the link between uncanniness and negative emotions, such as fear and apprehension, and discusses the genesis of the
term uncanny in early psychoanalytical writings, relating this to more modern theories on human emotion. Writings on the uncanny,
or related emotional states, from psychoacoustics, textiles research, films and computer games are assessed as to their validity and
potential application to the fostering of an aural climate of fear in computer games and, where such writings do not explicitly deal
with sound, attempts are made to apply the ideas contained within to sound as it exists within computer games. In dealing with the
theory of the Uncanny Valley, the paper points out the theory‟s focus on appearance and movement to the exclusion of sound and
suggests that there is an uncanny in sound that might, in future, be used to modify the Uncanny Valley theory. Throughout, there is
the suggestion that the uncanny (and any future theory of an audio or audiovisual Uncanny Valley) can be harnassed to the design of
horror computer games.
Ultimately, it is hoped, such work will be of use to computer game sound designers who wish to create a greater perception of fear
and apprehension through the canny use of uncanny sound. Some of the design tips presented at the end of the discussion are already
used instinctively by sound designers across a range of media, including computer games, whereas others are less obvious in their
origin and affect. Recently published empirical data is provided to strengthen the case for the latter. In some cases, the design tips
must await the coming of procedural audio to computer games.
1. Introduction
In 1970, Mori defined the Uncanny Valley as the low point in
negative perception that a robot (or similar character) provokes
as it increasingly takes on human appearance.[1] According to
this theory, the effect is more pronounced when movement is
involved (see Fig. 1).
Figure 1: Mori‟s graph of the Uncanny Valley (from
Conceptually, the theory has its grounding in early
psychoanalytical work. Freud expands upon Jentsch‟s 1906
definition of the uncanny as being something fundamentally
familiar yet unfamiliar (life-like automata and waxworks being
some of the examples cited) by adding definitional refinements
of his own.[2] These include an uncanniness of coincidence,
fear of one‟s eye-balls being gouged out (Freud
characteristically equates this with the fear of castration),
vestigial irrational beliefs surfacing uneasily in a rational world
structure and, related to this, the uncovering of that which
should not come to light. A common thread running through his
analysis deals with the feelings of the person experiencing the
uncanny feelings of eeriness, strangeness and fear. The
association of the uncanny with these emotional descriptors has
been emphasized by later writers discussing Mori‟s theory. In
particular, the emotion term fear has been equated to the
uncanny (for example, Ho et al., [3]) and this equation forms a
part of the underlying foundation of this paper.
The theory of the Uncanny Valley has been developed further in
the field of robotics and computer games with writers such as
MacDorman suggesting that the eerie sensations associated with
the uncanny might be used to advantage in the appropriate
context [4] (such as the survival horror genre of computer games
see also Hoeger and Huber [5]). Not all authorities accept the
theory Hanson suggests it is a pseudoscientific theory [6]
nevertheless, as a concept, it does provide the basis for some
interesting discussion and can function as a stimulus for sound
The audio Uncanny Valley: Sound, fear and the horror game
- 2 -
design. What almost all studies dealing with the Uncanny
Valley share, though, is a concentration on the image. Whether
still or moving, such writings invariably deal with the
appearance, and or motion, of the human-like character; there is
a visual bias to the study of the uncanny.
This paper asks: Is it possible to apply the Uncanny Valley
theory to the emotions aroused by sound and thus to include
sound as a factor in the Uncanny Valley? Although it does not
attempt to fully answer that question (such an answer must await
more empirical research), the paper does seek to identify
attributes of sound that contribute to the uncanny with the
assumption that, should this identification be achieved, it will
then be possible to codify aspects of computer game sound
design that elicit or block negative emotions such as fear and its
variants as the game genre requires. (In some future survival
horror game, the author imagines a less fear more fear sound
FX slider in the game set-up interface, or a continual
physiological monitoring of the player during gameplay that,
through real-time sound synthesis and audio processing, keeps
the player on a particular emotional roller-coaster.) Thus, where
many visual modellers attempt to cross the Uncanny Valley,
viewing it as an obstacle to overcome, this paper agrees with
authors such as MacDorman and Hoeger et al. that the trough of
the Uncanny Valley is, in some game genres, to be welcomed.
Additionally, though, it also further supports earlier suggestions
that, despite claims to the contrary, the obstacle that is the
Uncanny Valley cannot, in fact, be overcome.[7]
2. Sound, the Uncanny and the Valley
In addition to psychoanalytical work, popular literature also
contains descriptions of the uncanny and, in some cases,
descriptions of uncanny sound. Some quotations from The
Beasts of Tarzan demonstrate this: “From the lips of the ape-
man came a weird, uncanny sound […] strange, uncanny notes
that the girl could not ascribe to any particular night prowler
more terrible because of their mystery […] he was afraid of the
jungle; uncanny noises that were indeed frightful came forth
from its recesses”.[8] Such uncanny sound is typically
associated with negative emotions such as Plutchik‟s basic
emotion of terror and its less intense outgrowths fear and
apprehension.[9] (There are several theories of emotion but
Plutchik‟s is an interesting one to use in this context because of
its psychoevolutionary basis and the claim that emotions aid in
the survival of the organism in the environment interesting not
merely because of similar claims made by writers on the
uncanny, as illustrated below, but also because the exemplars
used in many articles on fear in computer games tend to be first-
person shooters or horror games where the player operates in a
hostile environment.) These emotions, according to Plutchek,
derive from the threat stimulus event which itself, as the Tarzan
stories show, is often associated with the unknown, the
unfamiliar, the darkness and the night. Such emotions and
scenarios are the basic ingredients of the horror genre in
literature, film and computer games.
There has been surprisingly little work on the association of
emotions with sound (that is, sound FX as opposed to music or
speech). In the area of sound design for the horror genre of
cinema (which the parallel genre in computer games closely
follows), and in the absence of any comprehensive and well-
founded methodology, such design proceeds on the basis of
experience, cliché or trial and error. It is usually no less
effective for this. Some research in the area of computer games
deals broadly with sound as a means to increase physiological
and, therefore, emotional arousal in the player [10] and other
work suggests this physiological arousal and associated emotion
leads to player engagement and immersion in the 3-dimensional
environments of first-person shooter games.[11] For a general
overview of threat and associated emotions in computer games,
see Perron.[12]
Interestingly, most research and writing on emotion and sound
in virtual reality and computer games deals with the negative
emotions terror, fear and apprehension (using Plutchik‟s
terminology) and their semantic variants (using others‟
terminologies); this paper continues that tradition. Outside of
3D-worlds, there is a wider survey of the emotions associated
with sound but, nevertheless, research is patchy. Owren and
Bachorowski, studying primate vocalizations, suggest that
primates use some sounds not to convey representational
information to a listener but to directly or indirectly affect and
arouse particular emotional states within the listener at a
fundamental cognitive level.[13] To do this, they manipulate
the parameters of the sound directly and the authors hypothesize
that this is how some forms of human laughter work and calls
for attention from babies, for example. Edworthy et al.
conducted an experiment on the perception of urgency as
various parameters of sound and harmonic patterns of audio
alarms were altered.[14] Their observations include: sounds
with a fast onset and offset of 20msecs. or less are perceived as
more urgent than sounds with a longer onset than offset which
are themselves perceived as more urgent than sounds with a
shorter onset than offset; and the more random the harmonic
pattern of the sounds, the more urgent the perception of those
sounds. The interesting result that sounds with a longer onset
than offset are perceived as more pressing than the reverse is
explained by the authors with the suggestion that the former
class of sounds has the characteristic amplitude envelope of
approaching sound sources whereas the latter class of sounds
has more of the character of receding sound sources. Although
no explanation is offered for the effect of melodic randomness,
it may be that the perception of urgency is related to the
uncertainty (in the West at least) arising from less tonally-
centred music and the consequent difficulty of processing the
tones and identifying a melodic pattern. This association of
uncertainty (a lack of fluency in the processing of sound) with
urgency and apprehension is developed further by authors
discussing fear and sound in horror computer games, a
discussion detailed below.
Alarms presage previously unseen threats and threats, according
to Plutchik are the stimulus events leading to feelings of terror,
fear and apprehension. Threat sounds are used to great effect in
the computer game Left 4 Dead particularly where the actions of
a player alert the swarm of zombies.[15] In this case, a wolf-
like howl heralds the swarm‟s attack and it is the predatory
denotation and lycanthropic connotation that is designed to send
a chill up the spine. Paralleling this, Halpern et al., analyzing
the nerve-jarring sound of fingernails scraping across a
blackboard, suggest aversion to such a sound either might be
because of its similarity to predator sounds or it might be (an
implicit suggestion in the paper) a vestigial response to proto-
human warning calls due to its similarity to the macaque
monkey‟s warning screech (this is acknowledged as pure
speculation in the paper and one of the authors has since
disowned the conjecture).[16]
Cho et al. conducted an investigation into the parameters of
textile sounds (that is, the rustling sounds of a variety of fabrics)
that, it was assumed, were responsible for negative feelings
about the textile.[17] Their results suggest that increasing
loudness and sharpness of timbre (the lowest sharpness acum
The audio Uncanny Valley: Sound, fear and the horror game
- 3 -
value for any of the fabrics was 2.38, equivalent to a band of
high frequency energy centred on approximately 5kHz) produce
physiological reactions associated with negative emotions. In
other words, loud, sharp sounds are not generally pleasant.
Compare this to Halpern et al. who, when discussing parameters
of sound leading to unpleasant feelings, accept loudness as a
factor but discount high frequencies instead pointing to low-mid
frequencies as the cause (a high-pass filter, attenuating
frequencies below 2kHz, decreased the subjective
unpleasantness rating). This suggests that negative affect
responses may well be provoked by the presence of certain
frequencies but there are probably additional factors involved as
well. Context, connotation or something more subconscious are
suggested by the reference to predatory sounds or macaque
monkeys and their warning calls; more unpleasant physical
associations might be suggested by the sound of crackling static
electricity close to the skin. That there might be instinctive
negative emotional responses to unexpected loud sounds in
general is supported by the Moro Reflex found through a limited
period in pre- and post-natal babies a reflex response to sound
(or falling) which, it has been suggested, is founded upon the
one unlearned and innate human fear.
Moncrieff et al., in a study assessing the possibility of
automatically classifying horror and thriller films by their audio
content, analyzed the frequency of sound energy and affect
events (these latter are an intentional emotional inflexion of the
events visually portrayed).[18] Affect events are indexical,
there being a “high level of semantic association between the
sound energy and affect events” – where detected sound energy
patterns correspond to an affect event (such as an alarm or sense
of apprehension), they affirm the affect event and so the
detection of many such patterns, according to the authors, can be
used to classify film according to either horror or thriller genre.
Given the typical intensity of the sound energy associated with
the horror-type affect event, it would be interesting to see if the
authors‟ classification method (tested on Western, Hollywood-
style cinema) works for the cinema of other cultures. As Mala
has stated: “Asian horror is often rooted in vision”.[19] This
contention is supported by Ringu [20] director Nakata: “Other
people tend to use different sounds altogether to express horror,
but I can increase the perception of it to the maximum by
utilizing a very quiet sound”.[21] The manifestation of threat
stimulus events for fear and apprehension may well contain
features that do not function uniformly across the human race
but, instead, are culturally specific in their threat and meaning.
In a comparison of the uncanny in Ringu and the American
remake The Ring [22], Ball provides a short section on aural
uncanniness which, in Ringu, is exemplified for the author by
the audio processing applied to the familiar sound of a ringing
digital telephone.[23] Here, the uncanny is created through the
process of making the familiar strange (the ringing heard is a
combination of multiple telephone rings slightly processed to
match the film‟s theme of water). According to Ball, it is this
defamiliarization of a mundane sound the distortion of a sound
that yet retains its broadly recognizable original form and
purpose that leads to the uncanny. This is, perhaps, too broad
and all-encompassing an explanation. There are many varieties
of telephone ring, each approximating the classic and iconic
telephone bell (both analogue and digital) but altering it in some
way; there is no suggestion that any of these sounds are uncanny
despite being defamiliarizing distortions of the original, familiar
ring. Instead, the context of the ring plays an important role; not
only is it framed within the horror film genre, it is signalled
early as an apprehensive aural cue, a threat stimulus event,
through the film‟s plot and narrative.
Ekman and Kajastila, rather than investigating the parameters of
sound contributing to the feeling of being scared, conducted a
small-scale, subjective study to determine the perceptual effect
of localization on sounds already pre-judged to be „scary‟.[24]
Importantly, sounds were played back to participants in the
absence of any contextualizing image. Sounds comprised those
made by “large predators, which motivates the importance of
localizing the threat”. The results support the authors‟
hypothesis “that the scariness of a (scary) sound is causally
related to how well it affords localizing a potentially harmful
source” and this, as the authors suggest, probably has its root
cause in the evolutionary link between fear and survival. The
inability to localize a sound is generalized to a lack of ease or
fluency in processing sound; thus, according to the authors,
“[t]he less information available, the more threatening the
situation should be”. This seems a surprisingly broad
assessment. It is unlikely that de-localizing all types of sound
will promote fear in the listener. Low-frequency sine waves,
and similar natural sounds such as whale song, are difficult to
localize yet are not necessarily threatening because of that
recordings of whale song are often used for relaxation purposes.
The same could be said for the general hum of traffic outside my
office or the 50-60Hz mains hum in a house. In the case of the
authors‟ study, a predatory sound (already judged to be scary) is
made more scary by removing the ability to localize it
generalizing this to all sounds is perhaps a step too far.
A sustaining thread in Ekman and Kajastila‟s argument is the
impact of uncertainty on the perceived level of scariness of a
sound. The concept of uncertainty also appears in Kromand‟s
study of sound in the survival horror game genre; specifically,
“a framework of uncertainty that constantly holds the player
between knowledge and ignorance”.[25] According to
Kromand, the soundscapes of survival horror computer games
purposefully mislead by making it unclear whether the sounds
heard derive from within the game diegesis or without: this
“collapse of the barrier between the diegetic and non-diegetic
soundscape is a strategy to build a horror atmosphere”. The
removal of causality, an understanding of and awareness of the
source of the sound, and its unsettling result is something that
has been described by Chion in the context of cinema.[26] For
sounds having no visible source on screen, Chion appropriates
the electro-acoustic term acousmatic: “A sound or voice that
remains acousmatic creates a mystery of the nature of its sound
source, its properties and its powers”. In film, the decision to
unveil the mystery or not belongs solely to the director; in
computer games, as Stockburger makes clear, such unveiling is
more dynamic and sound sources may be unmasked by scripted
events designed into the game (equivalent to the decisions of the
cinematic director) or by the kinaesthetic intervention of the
player.[27] In the cases that Kromand identifies, the sounds are
destined to remain acousmatic and thus they are, as Parker and
Heerama state, “instinctively threatening”.[28]
Brenton et al. review a number of theories on presence, realism
and the Uncanny Valley from which they derive five hypotheses
on the relationship of the Uncanny Valley to presence in virtual
worlds.[29] The Gestalt-derived theory of presence suggests the
brain chooses one of a set of hypotheses relating either to what
we perceive of a virtual world or to where we physically are.
Engaging in the arcana of a virtual world yet still aware of the
mundanity of reality (the weighty effect of gravity or the
physical environment around the computer monitor, for
example), the brain will pick one or the other hypothesis. The
hypothesis chosen dictates where we feel present; in virtuality or
reality. A switch from virtual hypothesis to reality hypothesis is
a break in presence. As a conjecture, Brenton et al. theorize
The audio Uncanny Valley: Sound, fear and the horror game
- 4 -
that, in some cases, a break in presence may be related to the
Uncanny Valley because both concern a change between the
perception of two similar states or hypotheses. Further,
describing as a switch the acceptance of one perceptual
hypothesis over another, might, they suggest, be incorrect. The
alternative theory they propose is that hypotheses are
superimposed and, at any one time, one or the other is dominant.
In suggesting that a break in presence is related to perception of
the uncanny and a realization of the Uncanny Valley, Brenton et
al. state that “[a]n uncanny character […] may be a weak link
that causes an unwanted break in presence”. However, if
uncanniness is related to negative emotions, such as fear and
apprehension, then such a perception, presumably, is, in fact,
wanted in the horror computer game. Tellingly, the authors
recount a previous study in which it was reported that a virtual
character elicited an uncanny response because its high level of
graphical realism was not matched by a similar level of
behavioural realism; the avatar seemed “like a zombie”. This
bears similarity to Laurel‟s statement that "we tend to expect
that the modalities involved in a representation will have
roughly the same "resolution" [...] A computer game that
incorporates breathtakingly high-resolution, high-speed
animation but produces only little beeps seems brain-
damaged".[30] It is interesting, in the context of a discussion on
uncanny sound and the horror computer game, that both sets of
authors choose to report or use terms such as „zombie‟ or „brain-
damaged‟ when talking of a mismatch of modality.
Brenton et al. make no mention of sound in their paper.
However, the reported mismatch between appearance and
behaviour, and its apparently consequential uncanny result, can
be compared to a recent paper by Tinwell and Grimshaw that
did include the voices of virtual characters (as well as their
facial expression and facial behaviour) in a relatively large-
sample qualitative study of the Uncanny Valley.[31] The results
of this study led to the conclusions that, with increasing visual
human-likeness of the character, perceptions of the uncanny
increased: with a lack of human-likeness of the voice; with an
increasing exaggeration of the articulation of the mouth while
speaking; and with increasing lack of synchronization between
lips and voice. In addition to noting that this is a study of the
Uncanny Valley that combines image and sound, it is worth
noting that all three of these uncanny factors involve some form
of mismatch between the visual and aural modalities. It is
interesting to speculate on why it is the case that the reverse of
the first conclusion above (that an aural resolution that is low
compared to the visual resolution leads to perceptions of
uncanniness) does not appear to lead to the uncanny. Television
and cinema are replete with examples of human animations of
varying visual resolution over which real human voices are
dubbed. I have suggested previously that “the [human] voice
and its expression of language […] is the primary marker of the
human being as opposed to other species […] using a real
human voice dubbed onto an animation strengthens its
anthropomorphic nature” and that cultural recognition of the
primacy of the human voice (and the cognitive faculties it
implies) may be found in a range of creation myths from
Christianity (In the beginning was the word) to Mayan Popul
Vuh (Tepeu, Gucumatz and Juracán met and devised new
beings capable of understanding, of speaking, of revering them
3. Conclusion
Having described a range of aural factors (or relationships
between sound and image) that appear to influence perceptions
of uncanniness, one might suppose that, in order to engender
fear and apprehension in a horror computer game, all one needs
do is apply these factors to the design of sound or its
relationship to the image. It is, of course, not so simple. While
bad cinematic over-dubbing (or careless foreign language
dubbing) can be annoying and even, perhaps, unsettling to
experience (uncanniness is hinted at by Pollick [34]), in another
context, another frame, it can be humorous. Re-dubbing of
Hong Kong Chock-Socky movies, with exaggeratedly
unsynchronized voices, are a recurrent comedic staple as are
high-pitched, helium-influenced voices dubbed onto men and
gruff, low-pitched voices on women. Context is all important
and, in the comedic examples given here, there is no suggestion
that there is anything uncanny or that there is something to be
fearful of (unlike the telephone ringing in Ringu described
above). This context, or framing, might, in fact, be another of
the Gestalt-like hypotheses a choice, for example, that this is a
comedy, and so should be laughed at, rather than a horror film,
with which to experience delicious dread. If so, this lends
credence to Brenton et al.‟s suggestion that such hypotheses
might be superimposed on each other but with the refinement
that dominant hypotheses might co-exist. In the case of a first-
person, survival horror computer game, the dominant
hypotheses are a virtual presence hypothesis (leading to
engagement and immersion in the game world) and a framing
hypothesis (cueing fear and apprehension rather than a typical
defence against terror laughter) co-existing above the
hypothesis of reality and other framing hypotheses.
Furthermore, if the uncanny really is, as Brenton et al. suggest,
associated with the break in presence phenomenon, then the
choice of a horror context as the framing hypothesis guards
against the dislocation; the fear and unease associated with the
uncanny are an expected part of the fabric of the virtual world
and so there is no break in presence.
Brenton et al. and Minato et al. [35] have discussed aspects of
(visual) habituation to appearance and behaviour in relation to
the uncanny. The fatal flaw in computer game sound design as
it currently stands, is that sound samples do not significantly
change across multiple re-playings of a game and thus what
might once have been unfamiliar and uncanny, becomes familiar
and mundane. The user becomes habituated to the sounds and
their use within the game world and knowledge replaces
uncertainty thus increasing confidence at the expense of fear.
Real-time sound synthesis (procedural audio) may go some way
to solving this issue (increasing rather than lessening uncertainty
and fear) by allowing the game engine to sonically respond in a
relatively unpredictable manner in real-time to the player‟s
presence and actions in the game world. Additionally,
increasing use of biofeedback coupled with procedural audio
techniques may well allow game engines to more precisely
manipulate the players emotions through real-time analysis of
the players emotional state. In this case, the game engine can
then itself make flexible decisions as to how to play with the
players emotions the player is too calm? then perhaps
increase the level of sonic uncertainty.
Whilst there is much, much more to understand about the
emotional effects of sound, the following general factors can be
used to either design in or design out uncanniness (and, by
extension, fear and apprehension) in the perception of a sound.
They are provided as a rough rule of thumb only, are based on
the small body of available research and may well work better in
The audio Uncanny Valley: Sound, fear and the horror game
- 5 -
combination. Above all, though, the sound designer should be
aware of the omnipotence of the framing context:
Certain amplitude envelopes applied to sound affect
perceptions of urgency.
Frequency might have an effect on the unpleasantness
of sound and this might lead to negative affect.
Familiar or iconic sounds can be defamiliarized and
this can lead to perceptions of uncanniness.
Uncertainty about the location of a sound source, its
cause or its meaning in the virtual world increases the
fear emotion.
An aural resolution that is lower than a high quality,
human-like visual resolution might lead to the
An exaggerated articulation of the mouth whilst
speaking might lead to the uncanny.
A lack of synchronization between lips and voice for
photo-realistic virtual characters leads to a perception
of the uncanny.
In a later empirical paper, Tinwell and Grimshaw suggest that,
despite claims to the contrary as industry personnel unveil the
latest human-like character, the Uncanny Valley cannot be
traversed.[36] Whereas Brenton et al. suggest the Uncanny
Valley is subject to change over time (uncanny characters can
climb out of the valley as they become familiar through
experience and use), Tinwell and Grimshaw hypothesize that a
traversal (to the rightmost lip of the valley and out of it) is
impossible (assuming there is an element of uncanniness in the
artefact to begin with). On the basis of the results of their study,
the authors suggest that it is not familiarity but increasing
technological discernment on the part of the audience that
forbids the traverse. Like parallel railway tracks that meet at the
horizon, technological advances and human quality speed into
the distance seemingly ever closer. Yet, upon closer inspection
and further up the tracks, this convergence is shown to be
merely an illusion and the two are destined to remain separate.
Accordingly, the authors suggest that the Uncanny Valley
should rather be thought of as an Uncanny Wall. However, this
is merely a hypothesis to be tested and, for now, the theory
Uncanny Valley provides enough conceptual grist to still be of
use. Should the Uncanny Valley or similar prove to exist for
(human-like) sound, critics should, perhaps, be equally wary of
attempts to claim it has been overcome. Naturally, in the horror
game genre, the news that there is an impassable Uncanny Wall
rather than a traversable Uncanny Valley is, presumably, to be
The theory of the Uncanny Valley (and its various expositions
thus far) deals solely with visual appearance, movement and/or
behaviour. It is clear, though, that there are parameters and
ways of representing sound that lead to perceptions of
uncanniness and associated negative affect. Future work, based
on further empirical research, will investigate whether the
Uncanny Valley can be used as a model for the perception of
The author, an atheist, is aware that this is an argument for a
distinguishing divine spark in humans.
uncanny sound or whether that sound follows its own uncanny
[1] Mori, M., The uncanny valley, Energy. Volume 7, 33-35
[2] Freud, S., The uncanny, The Standard Edition of the
Complete Psychological Works of Sigmund Freud, Volume 17,
London, Hogarth Press, 219-256 (1955)
[3] Ho, C.-C., MacDorman, K., & Pramono, Z. A. D., Human
emotion and the uncanny valley. A GLM, MDS, and ISOMAP
analysis of robot video ratings, Proceedings of the Third
ACM/IEEE International Conference on Human-Robot
Interaction, Amsterdam, 169-176, (2008)
[4] MacDorman, K. F., Subjective ratings of robot video clips
for human likeness, familiarity, and eeriness: An exploration of
the uncanny valley, ICCS/CogSci-2006 Long Symposium:
Toward Social Mechanisms of Android Science, Vancouver,
Canada, (2006)
[5] Hoeger, L., & Huber, W., Ghostly manipulation: Fatal
Frame II and the videogame uncanny, Situated Play,
Proceedings of DiGRA 2007, 152-156 (2007)
[6] Ferber, D., The man who mistook his girlfriend for a robot,
Popular Science,
[accessed: 27 April 2009], (2003)
[7] Plantec, P., Image Metrics attempts to leap the uncanny
valley, The Digital Eye,
[accessed: 27 April 2009], (2008)
[8] Burroughs, E. R., The beasts of Tarzan, Project Gutenberg, [accessed 2 May 2009],
[9] Plutchik, R., A general psychoevolutionary theory of
emotion, R. Plutchik & H. Kellerman (Eds.), Emotion: Theory,
research, and experience: Volume 1, Theories of emotion, New
York, Academic, 3-33 (1980)
[10] Shilling, R., Zyda, M., & Wardynski, E. C., Introducing
emotion into military simulation and videogame design:
America’s Army: Operations and VIRTE, GameOn, London,
[11] Grimshaw, M., Nacke, L., & Lindley, C. A., Sound and
immersion in the first-person shooter: Mixed measurement of
the player's sonic experience, Audio Mostly 2008, Piteå,
Sweden, (2008)
[12] Perron, B., Sign of a threat: The effects of warning systems
in survival horror games, COSIGN 2004, University of Split,
Croatia, (2004)
[13] Owren, M. J., & Bachorowski, J.-A., Reconsidering the
evolution of nonlinguistic communication: The case of laughter,
Journal of Nonverbal Behavior, Volume 27(3), 183200 (2003)
[14] Edworthy, J., Loxley, S., & Dennis, I., Improving auditory
warning design: Relationship between warning sound
parameters and perceived urgency, Human Factors, Volume
33(2), 205231 (1991)
[15] Valve, Left 4 Dead, (2008)
[16] Halpern, D. Lynn., Blake, R., & Hillenbrand, J.,
Psychoacoustics of a chilling sound, Percept Psychophys,
Volume 39(2), 7780 (1986)
[17] Cho, J., Yi, E., & Cho, G., Physiological responses evoked
by fabric sounds and related mechanical and acoustical
properties, Textile Research Journal, Volume 71(12), 1068
1073 (2001)
[18] Moncrieff, S., Venkatesh, S., & Dorai, C., Horror film
genre typing and scene labelling via audio analysis,
International Conference on Multimedia and Expo, (2003)
The audio Uncanny Valley: Sound, fear and the horror game
- 6 -
[19] Mala, E., The sound of horror, Newsweek, (2008)
[20] Nakata, H., Ringu, (1998)
[21] Naito, T., Interview with Hideo Nakata, Specter Director,
Kateigaho, (2005)
[22] Verbinski, G., The Ring, (2002)
[23] Ball, S. K. V. M., The uncanny in Japanese and American
horror film: Hideo Nakata's Ringu and Gore Verbinski's Ring,
Unpublished master's thesis, North Carolina State University,
Raleigh, NC, (2006)
[24] Ekman, I., & Kajastila, R., Localisation cues affect
emotional judgements: Results from a user study on scary
sound, AES 35th International Conference, London, (2009)
[25] Kromand, D., Sound and the diegesis in survival-horror
games, Audio Mostly 2008, Piteå, Sweden, (2008)
[26] Chion, M., Audio-vision: Sound on screen (C. Gorbman,
Trans.), New York, Columbia University Press, (1994)
[27] Stockburger, A., The rendered arena: Modalities of space
in video and computer games, Unpublished PhD thesis,
University of the Arts, London, (2006)
[28] Parker, J. R., & Heerama, J., Audio interaction in computer
mediated games, International Journal of Computer Games
Technology, 2008,
8923 [accessed: 27 December 2007], (2008)
[29] Brenton, H., Gillies, M., Ballin, D., & Chatting DJ., The
uncanny valley: Does it exist? Proceedings of the Human-
Animated Characters Interaction, HCI 2005: The Bigger Picture,
[30] Laurel, B., Computers as theatre. New York, Addison-
Wesley, (1993)
[31] Tinwell, A. & Grimshaw, M., Survival horror games - An
uncanny modality, Thinking After Dark, Montreal, (2009)
[32] Acoyauh (trans.), The creation,
creation.html [accessed 2 May 2009]
[33] Grimshaw, M., The acoustic ecology of the first-person
shooter: The player experience of sound in the first-person
shooter computer game, Saarbrücken, VDM Verlag, (2008)
[34] Pollick, F. E., In search of the uncanny valley, In K.
Grammer, & A. Juett (Eds.), Analog communication: Evolution,
brain mechanisms, dynamics, simulation, Cambridge, MA: MIT
Press, The Vienna Series in Theoretical Biology, (in press)
[35] Minato, T., Shimda, M., Ishiguro, H., & Itakura, S.,
Development of an android robot for studying human-robot
interaction, In Orchard, R., Yang, C and Ali, M., (Eds),
Innovations in Applied Artificial Intelligence, Volume 3029,
424-434 Berlin: Springer (2004)
[36] Tinwell, A. & Grimshaw, M., Bridging the uncanny: An
impossible traverse?, Mindtrek, Tampere, (2009)
... It's just wrong, it doesn't add up" (Phillips 2015). Comparatively, Mark Grimshaw (2009) takes up the notion of an audio uncanny valley as a positive aim for certain formats, particularly when provoking fear in horror games. He suggests that the defamiliarization that occurs through distortion of sound whereby it still retains elements of naturalness can be exploited to evoke desired emotions. ...
Full-text available
This article presents an overview of the first AI-human collaborated album, Hello World, by SKYGGE, which utilizes Sony’s Flow Machines technologies. This case study is situated within a review of current and emerging uses of AI in popular music production, and connects those uses with myths and fears that have circulated in discourses concerning the use of AI in general, and how these fears connect to the idea of an audio uncanny valley. By proposing the concept of an audio uncanny valley in relation to AIPM (artificial intelligence popular music), this article offers a lens through which to examine the more novel and unusual melodies and harmonization made possible through AI music generation, and questions how this content relates to wider speculations about posthumanism, sincerity, and authenticity in both popular music, and broader assumptions of anthropocentric creativity. In its documentation of the emergence of a new era of popular music, the AI era, this article surveys: (1) The current landscape of artificial intelligence popular music focusing on the use of Markov models for generative purposes; (2) posthumanist creativity and the potential for an audio uncanny valley; and (3) issues of perceived authenticity in the technologically mediated “voice”.
... This can be attributed to the artificial aspect of reproducing sounds electronically (e.g. 'uncanny valley of sound' [29]), as it can never sound as how the participants had experienced it in reality. Furthermore, to ensure all the participants heard the soundscapes clearly, the audio files were played at high volume. ...
Conference Paper
There has been an increased interest in researching the beneficial effects of everyday sounds, other than music on people with dementia. However, to turn this potential into concrete design applications, a qualitative understanding of how people engage with sound is needed. This paper presents the outcomes of three workshops, exploring the personal experiences evoked by soundscapes of people in early to mid-stages of dementia. Using the dementia soundboard, we provide key insights into how sounds from everyday life triggered personal associations, memories of the past, emotional responses, and the sharing of experiences. Furthermore, we identified several design considerations and practical insights for sound-based technologies in the context of dementia care. This paper sets out a path for further design-research explorations and development of concrete sound-based interventions, for enriching the everyday lives of people with dementia.
... Mitchell et al., 2011;Tinwell et al., 2015;MacDorman and Chattopadhyay, 2016). Accordingly, Grimshaw (2009) even suggests that deliberate mismatches of modality quality offer unique design options to creators of horror media, who might find value in playing on the 'primacy of the human voice' (p.4). In any case, regardless of the theoretical approach that is used to explore inconsistency effects, the canonical conclusion remains that the human brain has a strong desire to perceive coherence in its surroundings-not least including the processing of persuasive messages from both human and humanlike communicators. ...
Embodied agents—i.e. digital systems represented by a virtual or robotic body—are used as persuasive tools in many different contexts. Still, psychological research indicates that for an agent to successfully influence its audience, many design factors have to work together to create a likable and trustworthy impression. Tapping into literature on the uncanny valley phenomenon, which has received only little attention in the field of persuasion research, we advance a consistency perspective that proposes matching levels of modality realism as a main requirement for users' acceptance. In an online experiment, we invite 107 participants to watch the persuasive speech of a virtual agent, manipulating both its facial proportions and vocal realism in a 2 × 2 between-subject design. Indeed, a mismatch between the realism of both features significantly reduces the agent's perceived credibility and attractiveness; yet, we observe that neither manipulation actually influences persuasive success in terms of attitude change. A potential explanation for this result pattern is offered by the Elaboration Likelihood Model, assuming that participants focused more on the agent's message than on peripheral cues to adjust their attitudes.
... We chose this genre because it aims to elicit player affect of strong intensity (i.e., fear and anxiety) and sound is considered an important factor mediating this experience [40,34]. Thus, horror games are often used as stimuli for game audio research (e.g., [44,26,22]). We compared effects of the two game versions with validated questionnaires and a semi-structured interview, particularly investigating audio perception. ...
Conference Paper
Sound and virtual reality (VR) are two important output modalities for creating an immersive player experience (PX). While prior research suggests that sounds might contribute to a more immersive experience in games played on screens and mobile displays, there is not yet evidence of these effects of sound on PX in VR. To address this, we conducted a within-subjects experiment using a commercial horror-adventure game to study the effects of a VR and monitor-display version of the same game on PX. Subsequently, we explored, in a between-subjects study, the effects of audio dimensionality on PX in VR. Results indicate that audio has a more implicit influence on PX in VR because of the impact of the overall sensory experience and that audio dimensionality in VR may not be a significant factor contributing to PX. Based on our findings and observations, we provide five design guidelines for VR games.
Virtual Agents have been increasingly used as deliverers of notions in Simulated Human Interactions training effective communication strategies. Nevertheless, replicating the level of human-likeness required to "convince users (…) that a virtual human is the real thing" (Ruhland et al., 2015) remains a challenge. In particular, the Uncanny Valley effect refers to the observer's unpleasant impression of a virtual being with an almost, but not entirely, realistic human form (Seyama & Nagayama, 2007). Previous literature has described several intervening factors in the perception of uncanniness, including the Agent's static and dynamic features, but also individual differences in the degree of predisposition to anthropomorphize an Agent (e.g., Epley, Waytz, & Cacioppo, 2007; Kätsyri, Förger, Mäkäräinen, & Takala, 2015). During the last decades, video games have been representing an entertainment source for a growing number of people, and this dissertation's objective has been to confirm whether game habits might be considered among intervening factors. The video game industry has been driving technological innovation allowing for high-fidelity face and voice synthesis of Virtual Agents in Entertainment products. Such technologies are often not available to smaller research laboratories relying on limited resources. Therefore, the present dissertation has also explored the possibility of identifying "easy wins" on the short development run, essential elements that do not require expensive interventions in terms of money and time but can increase the perception of the Virtual Agent's quality. ENACT (Marocco, Pacella, Dell'Aquila, & Di Ferdinando, 2015), an online Simulated Human Interaction for the training of Negotiation strategies, has been used as the main object of this dissertation. In ENACT, trainees take five conversation turns with a Virtual Agent that communicates through a combination of four different facial expressions, 24 gestures, and ten different body postures and gaze directions. The present dissertation includes two experimental studies, exploring the effectiveness of low budget implementations of Virtual Agent’s features, i.e., random eye blinks and spoken gibberish accompanying written communication. Two samples of, respectively, 50 and 60 male participants, all aged between 18 and 35, have been recruited and preliminarily divided into habitual video game players and non-habitual video game players according to their mean weekly gameplay hours. Once randomly assigned to the experimental or control group, participants interacted with the Virtual Agent and completed the questionnaires related to its uncanniness evaluation and ENACT's perceived quality as an educational product. Results indicate that the mere introduction of eye blinks in random moments of the interaction with a Virtual Agent seems to moderately affect the user's perception of the Agents' realism. Moreover, in cases when modern text-to-speech voice engines are not available, it seems advisable to rely on a text-only form of communication for Virtual Agents instead of a gibberish-based communication. Results also suggest that video game habits might have a mediating role in the perception of Virtual Agents' qualities. Nevertheless, game habits might not simply posit higher standards to players but, instead, they might experience the "too real for comfort" zone differently, consequently assigning different expected social standards and normative expectations to Virtual Agents they interact with. Such results confirm that the perception of an Agent's human-likeness is a complex and dimensional matter, therefore including previous gaming literacy into the factors intervening in the perception of uncanniness.
Interactions with speech interfaces are growing, helped by the advent of intelligent personal assistants like Amazon Alexa and Google Assistant. This software is utilised in hardware such as smart home devices (e.g. Amazon Echo and Google Home), smartphones and vehicles. Given the unprecedented level of spoken interactions with machines, it is important we understand what is considered appropriate, desirable and attractive computer speech. Previous research has suggested that the overuse of humanlike voices in limited-communication devices can induce uncanny valley effects—a perceptual tension arising from mismatched stimuli causing incongruence between users’ expectations of a system and its actual capabilities. This chapter explores the possibility of verbal uncanny valley effects in computer speech by utilising the interpersonal linguistic strategies of politeness, relational work and vague language. This work highlights that using these strategies can create perceptual tension and negative experiences due to the conflicting stimuli of computer speech and ‘humanlike’ language. This tension can be somewhat moderated with more humanlike than robotic voices, though not alleviated completely. Considerations for the design of computer speech and subsequent future research directions are discussed.
Conference Paper
Full-text available
The Uncanny Valley was originally proposed in 1970 by Masahiro Mori. This term has been used to describe a subjective experience when someone faces a robot that has similar features of human beings. This less positive experience is a reaction that has been found in human engagement with 3D virtual characters as well. We reviewed the factors that are related to four established hypotheses in Uncanny Valley research, which are 1) Näive and Morbidity Hypothesis; 2) Movement Hypothesis; 3) Categorization ambiguity, and 4) Perceptual Mismatch. We also present examples available in the literature that identified this reaction in 3D film characters and video-games; and we suggest that the same reaction could be observed in virtual characters used in e-commerce websites. Finally, we interpret animal cognition and cognitive neuroscience evidences to support our view about how non verbal communication can be used in order to understand this reaction as part of the disgust emotion. We suggest that communicative signals mismatch between a human-machine (or between a human and a virtual character) interaction can affect experience only under specific conditions, mostly related to first impressions. The advantage of this approach is to consider both ontogenetic and evolutionary theories to explain the Uncanny Valley phenomenon. Keywords: Categorization; Communication; Evolutionary Mismatch; Human-Computer Interaction;Perceptual Mismatch; 3D characters
‘Expectations, reality and digital games’ focusses specifically upon VR’s turbulent narrative throughout the 1990s and early 2000s with a look at how VR sound continued to progress, largely through its implementation in digital games. Throughout the majority of this chapter, digital games are posited to have been the primary incubator of consumer VR concepts and technology, both in general and with regard to VR sound.
Full-text available
Masahiro Mori observed that as robots come to look more humanlike, they seem more familiar, until a point is reached at which subtle deviations from human norms cause them to look creepy. He referred to this dip in fa- miliarity and corresponding surge in strangeness as the uncanny valley. The eerie sensation associated with a mismatch between human expectations and a robot's behavior provides a useful source of feedback to improve the cognitive models implemented in the robot. Is the uncanny valley a necessary property of near-humanlike forms? This paper contributes to ongoing work in un- derstanding the nature and causes of the uncanny valley by means of an experiment: 56 participants were asked to rate 13 robots and 1 human, shown in video clips, on a very mechanical (1) to very humanlike (9) scale, a very strange (1) to very familiar (9) scale, and a not eerie (0) to extremely eerie (10) scale. Contrary to earlier studies with morphs (MacDorman and Ishiguro, 2006), plots of average and median values for ratings on these scales do not reveal a single U-shaped valley as predicted by Mori's uncanny valley hypothesis (1970), although his hypothesis allows for some variation owing to movement. Robots rated similarly on the mechanical versus human- like scale can be rated quite difierently on the strange versus familiar or the eeriness scales. The results in- dicate that the perceived human likeness of a robot is not the only factor determining the perceived familiarity, strangeness, or eeriness of the robot. This suggests that other factors could be manipulated to vary the familiar- ity, strangeness, or eeriness of a robot independently of its human likeness.
Conference Paper
Full-text available
The current paradigm for creating emotional impact in game sound is to carefully choose which sounds to play. This paper takes an alternative approach, suggesting that emotional impact of sounds can be affected by choosing how to play those sounds. We describe a novel concept for emotional sound design - emotional fine-tuning - and show how it is possible to systematically influence the emotional impact of a single sound sample. A controlled user study with 8 subjects confirmed that changing the reproduction of a sample so that source localization of the sound is challenged will increase its perceived scariness compared to the same sound with clearly detectable source. The work extends experimental research on emotion perception in sound. It has practical implications for sound design in games and other interactive media.
Full-text available
This paper studies the way survival horror games are designed to frighten and scare the gamer. Comparing video games and movies, the experiential state of the gamer and that of the spectator, as well as the shock of surprise and tension suspense, it focuses on the effects of forewarning on the emotional responses to survival horror games.
Full-text available
Through a close-play and close reading of the game Fatal Frame II, we identify the uniquely game-based aspects of the uncanny in a horror game. Subsequently, we engage in an interpretation of the game which centers on a psychoanalytic model of the avatar and theories of the twin. If we can characterize the aesthetics of the uncanny as the familiar-made-unfamiliar, we can identify a gamerly uncanny: one activated by its implementation in a software-based virtual environment in a fictive game-world. We have played and studied the game Fatal Frame II: Crimson Butterfly [11], both as a unique work in its own right and as exemplary of the production of the uncanny in videogames. Fatal Frame II: Crimson Butterfly (零:红蝶, or Rei Zero: Beni Chou in Japanese) was developed in Japan by Tecmo, Ltd., and released for the Sony Playstation 2 in 2003. The game, a sequel to the moderately successful Fatal Frame, received critical acclaim and enjoyed commercial success; another sequel was produced, and the game was ported to the Xbox in 2006.
Full-text available
We digitally synthesized versions of the sound of a sharp object scraping across a slate surface (which mimics the sound of fingernails scraping across a blackboard) to determine whether spectral content or amplitude contour contributed to its obnoxious quality. Using magnitude estimation, listeners rated each synthesized sound’s unpleasantness. Contrary to intuition, removal of low, but not of high, frequencies lessened the sound’s unpleasantness. Manipulations of the signal amplitude had no significant impact on listeners’ unpleasantness estimates. Evidently, low-frequency spectral factors contribute primarily to the discomfort associated with this sound.
Conference Paper
Full-text available
Recent advances in computer animation and robotics have lead to greater and greater realism of human appearance to be obtained both on screen and in physical devices. A particular issue that has arisen in this pursuit is whether increases in realism necessarily lead to increases in acceptance. The concept of the uncanny valley suggests that high, though not perfect, levels of realism will result in poor acceptance. We review this concept and its psychological basis. © Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering 2010.
Conference Paper
Full-text available
Behavior or Appearance? This is fundamental problem in robot de- velopment. Namely, not only the behavior but also the appearance of a robot influences human-robot interaction. There is, however, no research approach to tackling this problem. In order to state the problem, we have developed an an- droid robot that has similar appearance as humans and several actuators gener- ating micro behaviors. This paper proposes a new research direction based on the android robot.
This study is performed to determine the characteristics of fabric sounds and the mechanical properties to predict their auditory comfort. In order to obtain quantitative information about emotional changes evoked by fabric sounds, physiological signals are acquired and analyzed when each fabric sound is presented to participants. Physiological parameters employed in this study are electroencephalogram and autonomic nervous system activities, including photoplethysmogram (PPG), skin conductance level (SCL), and the ratio of low frequency to high frequency (LF/HF) from the power spectrum of heart rate variability. As sound parameters, the pressure level of total sound (LPT), loudness(z), and sharpness(z) are calculated. Mechanical properties of the fabrics are measured with the KES-FB system. Bending hysteresis and weight. reflect negative sensations, influencing slow alpha, LF/HF, and SCL. Among sound parameters, LPT, loudness(z), and sharpness(z) influence negative sensations, causing the increment of SCL and LF/HF with the decrement of slow alpha and PPG.