ArticlePDF Available

Mediating the Expression of Emotion in Educational CVEs: An Experimental Study

Authors:

Abstract and Figures

The use of avatars with emotionally expressive faces is potentially highly beneficial to communication in collaborative virtual environments (CVEs), especially when used in a distance learning context. However, little is known about how, or indeed whether, emotions can effectively be transmitted through the medium of CVE. Given this, an avatar head model with limited but human-like expressive abilities was built, designed to enrich CVE communication. Based on the Facial Action Coding System (FACS), the head was designed to express, in a readily recognisable manner, the six universal emotions. An experiment was conducted to investigate the efficacy of the model. Results indicate that the approach of applying the FACS model to virtual face representations is not guaranteed to work for all expressions of a particular emotion category. However, given appropriate use of the model, emotions can effectively be visualised with a limited number of facial features. A set of exemplar facial expressions is presented.
Content may be subject to copyright.
1
Mediating the Expression of Emotion in
Educational Collaborative Virtual
Environments: An Experimental Study.
MARC FABRI, DAVID MOORE
ISLE Research Group, Leeds Metropolitan University
m.fabri@leedsmet.ac.uk, d.moore@leedsmet.ac.uk
tel +44-113 283 2600
fax +44-113 283 3182
DAVE HOBBS
School of Informatics, University of Bradford
d.hobbs@bradford.ac.uk
tel +44-1274 236 135
fax +44-1274 233 727
Abstract: The use of avatars with emotionally expressive faces is potentially highly beneficial to
communication in collaborative virtual environments (CVEs), especially when used in a distance
learning context. However, little is known about how, or indeed whether, emotions can effectively
be transmitted through the medium of CVE. Given this, an avatar head model with limited but
human-like expressive abilities was built, designed to enrich CVE communication. Based on the
Facial Action Coding System (FACS), the head was designed to express, in a readily recognisable
manner, the six universal emotions. An experiment was conducted to investigate the efficacy of
the model. Results indicate that the approach of applying the FACS model to virtual face
representations is not guaranteed to work for all expressions of a particular emotion category.
However, given appropriate use of the model, emotions can effectively be visualised with a
limited number of facial features. A set of exemplar facial expressions is presented.
Keywords: avatar, collaborative virtual environment, emotion, facial expression
Fabri, M., Moore, D.J., Hobbs, D.J (2004) Mediating the Expression of Emotion in Educational Collaborative
Virtual Environments: An Experimental Study, in International Journal of Virtual Reality, Springer Verlag, London
Received: 3 September 2002 Accepted: 2 October 2003 Published online: 5 February 2004
http://dx.doi.org/10.1007/s10055-003-0116-7
2
1 Introduction
This document outlines an experimental study to investigate the use of facial
expressions for humanoid user representations as a means of non-verbal
communication in CVEs. The intention is to establish detailed knowledge about
how facial expressions can be effectively and efficiently visualised in CVEs.
We start by arguing for the insufficiency of existing distance communication
media in terms of emotional context and means for emotional expression, and
propose that this problem could be overcome by enabling people to meet virtually
in a CVE and engage in quasi face-to-face communication via their avatars. We
further argue that the use of avatars with emotionally expressive faces is
potentially highly beneficial to communication in CVEs.
However, although research in the field of CVEs has been proceeding for some
time now, the representation of user embodiments, or avatars, in most systems is
still relatively simple and rudimentary [1]. In particular, virtual environments are
often poor in terms of the emotional cues that they convey [2]. Accordingly, the
need for sophisticated ways to reflect emotions in virtual embodiments has been
pointed out repeatedly in recent investigations [3,4].
In the light of this, a controlled experiment was conducted to investigate the
applicability of non-verbal means of expression, particularly the use of facial
expressions, via avatars in CVE systems. It is the purpose of the experiments to
establish whether and how emotions can effectively be transmitted through the
medium of CVE.
2. A Case for CVE for Education
Today's information society provides us with numerous technological options to
facilitate human interaction over a distance, in real time or asynchronously:
telephony, electronic mail, text-based chat, video-conferencing systems. These
tools are useful, and indeed crucial, for people who cannot come together
physically but need to discuss, collaborate on, or even dispute certain matters.
Distance Learning programmes make extensive use of such technologies to enable
3
communication between spatially separated tutors and learners, and between
learners and fellow learners [5]. Extensive research [6,7,8] has shown that such
interaction is crucial for the learning process, for the purpose of mutual reflection
on actions and problem solutions, for motivation and stimulation as well as
assessment and control of progress. It has given rise to a growing body of literature
in computer supported collaborative learning , cf [9, 10].
However, when communicating over a distance through media tools, the
emotional context is often lost, as well as the ability to express emotional states in
the way one is accustomed to in face-to-face conversations. When using text-based
tools, important indicators like accentuation, emotion, and change of emotion or
intonation are difficult to mediate [11]. Audio conferencing tools can alleviate some
of these difficulties but lack ways to mediate non-verbal means of communication
such as facial expressions, posture or gesture. These channels, however, play an
important role in human interaction and it has been argued that the socio-emotional
content they convey is vital for building relationships that need to go beyond purely
factual and task-oriented communication [11].
Video conferencing can alleviate some of the shortcomings concerning body
language and visual expression of a participant's emotional state. Daly-Jones et al
[12] identify several advantages of video conferencing over high quality audio
conferencing, in particular the vague awareness of an interlocutor's attentional
focus. However, because of the non-immersive character of typical video-based
interfaces, conversational threads during meetings can easily break down when
people are distracted by external influences or have to change the active window,
for example to handle electronically shared data [13].
CVEs are a potential alternative to these communication tools, aiming to
overcome the lack of emotional and social context whilst at the same time
offering a stimulating and integrated framework for conversation and
collaboration. Indeed, it can be argued that CVEs represent a communication
technology in their own right due to the highly visual and interactive character of
the interface that allows communication and the representation of information in
new, innovative ways. Users are likely to be actively engaged in interaction with
4
the virtual world and with other inhabitants. In the distance learning discipline in
particular, this high-level interactivity where the users' senses are engaged in the
action and they 'feel' they are participating in it, is seen as an essential factor for
effective and efficient learning [14].
3. The need for emotionally expressive avatars
The term "non-verbal communication" is commonly used to describe all human
communication events which transcend the spoken or written word [15]. It plays a
substantial role in human interpersonal behaviour. Social psychologists argue that
more than 65% of the information exchanged during a person-to-person
conversation is carried on the non-verbal band [16]. Argyle [17] sees non-verbal
behaviour taking place whenever one person influences another by means of facial
expressions, gestures, body posture, bodily contact, gaze and pupil dilation,
spatial behaviour, clothes, appearance, or non-verbal vocalisation (e.g. murmur).
A particularly important aspect of non-verbal communication is its use to convey
information concerning the emotional state of interlocutors. Wherever one
interacts with another person, that other person's emotional expressions are
monitored and interpreted – and the other person is doing the same [18]. Indeed,
the ability to judge the emotional state of others is considered an important goal in
human perception [19], and it is argued that from an evolutionary point of view, it
is probably the most significant function of interpersonal perception. Since
different emotional states are likely to lead to different courses of action, it can be
crucial for survival to be able to recognise emotional states, in particular anger or
fear in another person. Similarly, Argyle [17] argues that that the expression of
emotion, in the face or through the body, is part of a wider system of natural
human communication that has evolved to facilitate social life. Keltner [20]
showed that for example embarrassment is an appeasement signal that helps
reconcile relations when they have gone awry, a way of apologising for making a
social faux-pas. Again, recent findings in psychology and neurology suggest that
emotions are also an important factor in decision-making, problem solving,
cognition and intelligence in general [see 19,21,22,23].
5
Of particular importance from the point of view of education, it has been argued
that the ability to show emotions, empathy and understanding through facial
expressions and body language is central to ensuring the quality of tutor-learner and
learner-learner interaction [24]. Acceptance and understanding of ideas and
feelings, encouraging and criticising, silence, questioning – all involve non-verbal
elements of interaction [15,24].
And given this, it can be argued that CSCL technologies ought to provide for at
least some degree of non-verbal, and in particular emotional, communication. For
instance, the pedagogical agent STEVE [25] is used in a virtual training
environment for control panel operation. STEVE has the ability to give instant
praise or express criticism via hand and head gestures depending on a student's
performance. Concerning CVE technology in particular, McGrath and Prinz [26]
call for appropriate ways to express presence and awareness in order to aid
communication between inhabitants, be it full verbal communication or non-
verbal presence in silence.
Thalmann [1] sees a direct relation between the quality of a user’s representation
and their ability to interact with the environment and with each other. Even
avatars with rather primitive expressive abilities can potentially cause strong
emotional responses in people using a CVE system [27]. It appears, then, that the
avatar can readily take on a personal role, thereby increasing the sense of
togetherness, the community feeling. It potentially becomes a genuine
representation of the underlying individual, not only visually, but also within a
social context.
It is argued, then, that people’s naturally developed skill to “read” emotional
expressions is potentially highly beneficial to communication in CVEs in general,
and educational CVEs in particular. The emotionally expressive nature of an
interlocutor's avatar may be able to aid the communication process and provide
information that would otherwise be difficult to mediate.
6
4. Modelling an emotionally expressive avatar
Given that emotional expressiveness would be a desirable attribute of CVE, the
issue becomes one of how such emotional expressions can be mediated. Whilst all
of the different channels for non-verbal communication - face, gaze, gesture,
posture - can in principle be mediated in CVEs to a certain degree, our current
work focuses on the face. For in the real world it is the face that is the most
immediate indicator of the emotional state of a person [28].
While physiology looks beneath the skin, physiognomy stays on the surface
studying facial features and lineaments. It is the art of judging character or the
emotional state of an individual from the features of the face [29]. The face
reflects interpersonal attitudes, provides feedback on the comments of others, and
is regarded as the primary source of information after human speech [15].
Production (encoding) and recognition (decoding) of distinct facial expressions
constitute a signalling system between humans [30]. Surakka and Hietanen [31]
see facial expressions of emotion clearly dominating over vocal expressions of
emotion; Knapp [15] generally considers facial expressions as the primary site for
communication of emotional states.
Indeed, most researchers even suggest that the ability to classify facial expressions
of an interlocutor is a necessary pre-requisite for the inference of emotion. It
appears that there are certain key stimuli in the human face that support cognition.
Zebrowitz [32] found that, for example, in the case of an infant's appearance these
key stimuli can by themselves trigger favourable emotional responses. Strongman
[18] points out that humans make such responses not only to the expression but
also to what is believed to be the “meaning” behind the expression.
Our work therefore concentrates on the face. To model an emotionally expressive
avatar face, the work of [33] is followed. It was found that there are six universal
facial expressions, corresponding to the following emotions: Surprise, Anger,
Fear, Happiness, Disgust/Contempt, and Sadness. This categorisation is widely
accepted, and considerable research has shown that these basic emotions can be
accurately communicated by facial expressions [32,34]. Indeed, it is held that
expression and, to an extent, recognition, of these six emotions has an innate
7
basis. They can be found in all cultures, and correspond to distinctive patterns of
physiognomic arousal. Figure 1 shows sample photographs depicting the six
universal emotions, together with the neutral expression (from [35], used with
permission).
Figure 1: The six universal emotions and neutral expression
4.1 Describing facial expression
Great effort has gone into the development of scoring systems for facial
movements. These systems attempt to objectively describe and quantify all
visually discriminable units of facial action seen in adults. For the purpose of
analysis, the face is typically broken down into three areas:
1. brows and forehead
2. eyes, eyelids, and root of the nose
3. lower face with mouth, nose, cheeks, and chin
These are the areas which appear to be capable of independent movement. In
order to describe the visible muscle activity in the face comprehensively, the
“Facial Action Coding System” (FACS) was developed [36]. FACS is based on
8
highly detailed anatomical studies of human faces and results from a major body
of work. It has formed the basis for numerous series of experiments in social
psychology, computer vision and computer animation [cf. 37,38,39,40].
A facial expression is a high level description of facial motions, which can be
decomposed into certain muscular activities, i.e. relaxation or contraction, called
“Action Units” (AUs). FACS identifies 58 action units, which separately or in
various combinations are capable of characterising any human expression. An AU
corresponds to an action produced by one or a group of related muscles. Action
Unit 1, for example, is the inner-brow-raiser, a contraction of the central frontalis
muscle. Action Unit 7 is the lid-tightener, tightening the eyelids and thereby
narrowing the eye opening.
FACS is usually coded from video or photographs, and a trained human FACS
coder decomposes an observed expression into the specific AUs that occurred,
their duration, onset, and offset time [37]. From this system, some very specific
details can be learnt about facial movement for different emotional expressions of
humans in the real world. For instance, the brow seems capable of the fewest
positions and the lower face the most [15]. Certain emotions also seem to manifest
themselves in particular areas of the face. The best predictors for anger for
example are the lower face and the brows/forehead area, whereas sadness is most
revealed in the area around the eyes [15].
For our current modelling work, then, FACS is adapted to generate the expression
of emotions in the virtual face, by applying a limited number of relevant action
units to the animated head. Figure 2 shows photographs of some alternative
expressions for the anger emotion category, together with the corresponding
virtual head expressions as modelled by our avatar. Equivalent representations
exist for all remaining universal emotions (and the neutral expression). All
photographs are taken from the Pictures of Facial Affect databank [35].
9
Figure 2: Photographs showing variations of Anger, with corresponding virtual heads
4.2 Keeping it simple
Interest in modelling the human face has been strong in the computer graphics
community since the 1980s. The first muscle-based model of an animated face,
using geometric deformation operators to control a large number of muscle units,
was developed by Platt and Badler [41]. This was developed further by modelling
the anatomical nature of facial muscles and the elastic nature of human skin,
resulting in a dynamic muscle model [40,42].
The approach adopted in this study, however, is feature-based and therefore less
complex than a realistic simulation of real-life physiology. It is argued that it is
not necessary, and indeed may be counter-productive, to assume that a “good”
avatar has to be a realistic and very accurate representation of the real world
physiognomy. We argue this partly on the ground that early evidence suggested
that approaches aiming to reproduce the human physics in detail may in fact be
wasteful [43].
Indeed, this has been described as the Uncanny Valley [44], originally created to
predict human psychological reaction to humanoid robots (see figure 3, adapted
from [45]). When plotting human reaction against robot movement, the curve
10
initially shows a steady upward trend. That trend continues until the robot reaches
reasonably human quality. It then plunges down dramatically, even evoking a
negative emotional response. A nearly human robot is considered irritating and
repulsive. The curve only rises again once the robot eventually reaches complete
resemblance with humans.
Figure 3: The "Uncanny Valley"
It is postulated that human reaction to avatars is similarly characterised by an
uncanny valley. An avatar designed to suspend disbelief that is only nearly
realistic may be equally confusing and not be accepted, even considered repulsive.
In any event, Hindmarsh et al [46] suggest that even with full realism and full
perceptual capabilities of physical human bodies in virtual space, opportunities for
employing more inventive and evocative ways of expression would probably be
lost if the focus is merely on simulating the real world - with its rules, habits and
limitations.
It may be more appropriate, and indeed more supportive to perception and
cognition, to represent issues in simple or unusual ways. Godenschweger et al
[47] found that minimalist drawings of body parts, showing gestures, were
generally easier to recognise than more complex representations. Further, Donath
[48] warns that because the face is so highly expressive and humans are so adept
in reading (into) it, any level of detail in 3D facial rendering could potentially
provoke the interpretation of various social messages. If these messages are
unintentional, the face will arguably be hindering rather than helping
communication.
11
Again, there is evidence that particularly distinctive faces can convey emotions
more efficiently than normal faces [32,49,50], a detail regularly employed by
caricaturists. The human perception system can recognise physiognomic clues, in
particular facial expressions, from very few visual stimuli [51].
To summarise, rather than simulating the real world accurately, we aim to take
advantage of humans’ innate cognitive abilities to perceive, recognise and
interpret distinctive physiognomic clues. With regard to avatar expressiveness and
the uncanny valley, we are targeting the first summit of the curve (see figure 3)
where human emotional response is maximised while employing a relatively
simple avatar model.
4.3 Modelling facial expression
In order to realise such an approach in our avatar work, we developed an animated
virtual head with a limited number of controllable features. It is loosely based on
the H-Anim specification [52] developed by the international panel that develops
the Virtual Reality Modeling Language (VRML). H-Anim specifies seven control
parameters:
1. left eyeball
2. right eyeball
3. left eyebrow
4. right eyebrow
5. left upper eyelid
6. right upper eyelid
7. temporomandibular (for moving the jaw)
Early in the investigation it became evident, however, that eyeball movement was
not necessary as the virtual head was always in direct eye contact with the
observer. We also found that although we were aiming at a simple model, a single
parameter for moving and animating the mouth area (temporomandibular) was
insufficient for the variety of expressions required in the lower face area.
12
Consequently, the H-Anim basis was developed further and additional features
were derived from, and closely mapped to, FACS action units. This allowed for
greater freedom, especially in the mouth area. It has to be noted that while FACS
describes muscle movement, our animated head was not designed to necessarily
emulate such muscle movement faithfully, but to achieve a visual effect very
similar to the result of muscle activity in the human face.
It turned out that it is not necessary for the entire set of action units to be
reproduced in order to achieve the level of detail envisaged for the current face
model. In fact, reducing the number of relevant action units is not uncommon
practice for simple facial animation models [see 53,54], and this study used a
subset of 11 action units (see table 1).
Table 1: Reduced set of Action Units
AU Facial Action Code Muscular Basis
1 Inner Brow Raiser Frontalis, Pars Medialis
2 Outer Brow Raiser Frontalis, Pars Lateralis
4 Brow Lowerer Depressor Glabellae, Depressor Supercilli, Corrugator
5 Upper Lid Raiser Levator Palpebrae Superioris
7 Lid Tightener Orbicularis Oculi, Pars Palebralis
10 Upper Lip Raiser Levator Labii Superioris, Caput Infraorbitalis
12 Lip Corner Puller Zygomatic Major
15 Lip Corner Depressor Triangularis
17 Chin Raiser Mentalis
25 Lips Part Depressor Labii, Relaxation of Mentalis or Orbicularis Oris
26 Jaw Drop (mouth only) Masetter, Relaxation of Temporal and Internal Preygoids
The relevant animation control parameters required to model facial features that
correspond to these 11 action units are illustrated in figure 4.
13
Figure 4: Controllable features of the virtual head
As an example, figure 5 shows four variations of the Sadness emotion, as used in
the experiment. Note the wider eye opening in 1, and the change of angle and
position of eyebrows.
Figure 5: Variations within emotion category Sadness
Certain facial features have deliberately been omitted to keep the number of
control parameters, and action units, low. For example, AU12 (lip corner puller)
normally involves a change in cheek appearance. The virtual head however shows
AU12 only in the mouth corners. Also, the virtual head showing AU26 (jaw drop)
does not involve jawbone movement but is characterised solely by the relaxation
of the mentalis muscle, resulting in a characteristic opening of the mouth. These
omissions were considered tolerable, as they did not appear to change the visual
appearance of the expression significantly. Accordingly, neither the statistical
analysis nor feedback from participants indicated a disadvantage of doing so.
In summary, then, we argue that the virtual face model introduced above is a
potentially effective and efficient means for conveying emotion in CVEs. By
14
reducing the facial animation to a minimal set of features believed to display the
most distinctive area segments of the six universal expressions of emotion
(according to [28]), we take into account findings from cognitive and social
psychology. These findings suggest that there are internal, probably innate,
physiognomic schemata that support face perception and emotion recognition in
the face [55]. This recognition process works with even a very limited set of
simple but distinctive visual clues [17,51].
5. Experimental investigation
We argue, then, that there is a strong prima facie case that the proposed virtual
head, with its limited, but human-like expressive abilities, is a potentially effective
and efficient means to convey emotions in virtual environments, and that the
reduced set of action units and the resulting facial animation control parameters
are sufficient to express, in a readily recognisable manner, the six universal
emotions.
We have experimentally investigated this prima facie argument, comparing
recognition rates of virtual head expressions with recognition rates based on
photographs of faces for which FACS action unit coding, as well as recognition
rates from human participants, was available. These photographs were taken from
[35]. A detailed description of the experimental setup is presented in this section.
The aims of the experiment were (a) to investigate the use of simple but
distinctive visual clues to mediate the emotional and social state of a CVE user,
and (b) to establish the most distinctive and essential features of an avatar facial
expression.
Given these aims, the experiment was designed to address the following working
hypothesis: “For a well-defined subset that includes at least one expression per
emotion category, recognition rates of the virtual head model and of the
corresponding photographs are comparable”.
5.1 Design
The independent variable (IV) in this study is the stimulus material presented to
the participants. The facial expressions of emotion are presented in two different
15
ways, as FACS training photographs or displayed by the animated virtual head.
Within each of these two factors, there are seven sub-levels (the six universal
expressions of emotion and neutral). The dependent variable (DV) is the success
rate achieved when assigning the presented expressions of emotion to their
respective categories.
Two control variables (CVs) can be identified: the cultural background of
participants and their previous experience in similar psychological experiments.
Since the cultural background of participants potentially may affect their ability to
recognise certain emotions in the face [32], this factor was neutralised by ensuring
that all participants had broadly the same ability concerning the recognition of
emotion. In the same manner, it was checked that none of the participants had
previous experience with FACS coding or related psychological experiments, as
this may influence the perception abilities due to specifically developed skills.
We adopted a one-factor, within subjects design (also known as repeated
measures design) for the experiment. The factor comprises two levels, photograph
or virtual face, and each participant performs under both conditions:
Condition A: emotions depicted by virtual head
Condition B: emotions shown by persons on FACS photographs
29 participants took part in the experiment, 17 female and 12 male, with an age
range from 22 to 51 years old. All participants were volunteers. None had
classified facial expressions or used FACS before. None of the participants
worked in facial animation, although some were familiar with 3D modelling
techniques in general.
5.2 Procedure
The experiment involved three phases: a pre-test questionnaire, a recognition
exercise and a post-test questionnaire. Each participant was welcomed by the
researcher and seated at the workstation where the experiment would be
conducted. The researcher then gave the participant an overview of what was
expected of him/her and what to expect during the experiment. Care was taken not
16
to give out information that might bias the user. The participants were assured that
they themselves were not under evaluation and that they could leave the
experiment at any point if they felt uncomfortable. Participants were then
presented with the pre-test questionnaire, which lead into the recognition exercise.
From this moment, the experiment ran automatically, via a bespoke software
application, and no further experimenter intervention was required.
The actual experiment was preceded by a pilot test with a single participant. This
participant was not part of the participant group in the later experiment. The pilot
run confirmed that the software designed to present the stimulus material and
collect the data was functioning correctly and also that a 20 minutes duration time
per participant was realistic. Further, it gave indications that questionnaire items
possess the desired qualities of measurement and discriminability.
The pre-test questionnaire (figure 6) collected information about the participant in
relation to their applicability for the experiment.
Figure 6: Pre-test questionnaire
The Cancel button allowed abortion of the experiment at any stage, in which case
all data collected so far was deleted. Back and Next buttons were displayed
17
depending on the current context. A screen collecting further data about the
participant's background on FACS as well as possible involvement in similar
experiments followed the pre-test questionnaire.
Before the recognition task started, a “practice” screen illustrating the actual
recognition screen and giving information about the choice of emotion categories
and the functionality of buttons and screen element, was shown to the participant.
During the recognition task, each participant was shown 28 photographs and 28
corresponding virtual head images, mixed together in a randomly generated order
that was the same for all participants. Each of the six emotion categories was
represented in 4 variations, and 4 variations of the neutral face were also shown.
The variations were defined not by intensity, but by differences in expression of
the same emotion. The controllable parameters of the virtual head were adjusted
so that they corresponded with the photographs.
All material was presented in digitised form, i.e. as virtual head screenshots and
scanned photographs, respectively. Each of the six emotion categories was
represented in 4 variations. In addition, there were 4 variations of the neutral face.
Each participant was therefore asked to classify 56 expressions (2 conditions x 7
emotion categories x 4 variations per category).
All virtual head images depicted the same male model throughout, whereas the
photographs showed several people, expressing a varying number of emotions (21
images showing male persons, 8 female).
The order of expressions in terms of categories and variations was randomised but
the same for all participants. Where the facial atlas does not provide 4 distinctive
variations of a particular emotion category, or the virtual head cannot show the
variation because of the limited set of animation parameters, a similar face was
repeated.
The face images used in the task were cropped to display the full face, including
hair. Photographs were scaled to 320 x 480 pixels, whereas virtual head images
18
were slightly smaller at 320 x 440 pixels. The data collected for each facial
expression of emotion consisted of:
the type of stimulus material
the expression depicted by each of the facial areas
the emotion category expected
the emotion category picked by the participant
A “recognition screen” (figure 7) displayed the images and provided buttons for
participants to select an emotion category. In addition to the aforementioned seven
categories, two more choices were offered. The “Other…” choice allowed entry of
a term that, according to the participant, described the shown emotion best but
was not part of the categories offered. If none of the emotions offered appeared to
apply, and no other emotion could be named, the participant was able to choose
“Don't know”.
Figure 7: Recognition screen
On completion of the recognition task, the software presented the post-test
questionnaire (figure 8) to the participant. This collected various quantitative and
19
qualitative data, with a view to complementing the data collected during the
recognition task. The Next button was enabled only on completion of all rows.
Figure 8: Post-test questionnaire
6. Results
The overall number of pictures shown was 1624 (29 participants x 56 pictures per
participant). On average, a participant took 11 minutes to complete the experiment
including pre-test and post-test questionnaire. Results show that recognition rates
vary across emotion categories, as well as between the two conditions. Figure 9
summarises this:
Figure 9: Summary of recognition rates
20
Surprise, Fear, Happiness and Neutral show slightly higher recognition rates for
the photographs, while in categories Anger and Sadness the virtual faces are more
easily recognised than their counterparts. Disgust stands out as it shows a very
low scoring for virtual faces (around 20%) in contrast to the result for
photographs of disgust which is over 70%.
Overall, results clearly suggest that recognition rates for photographs (78.6%
overall) are significantly higher than those for virtual heads (62.2% overall). The
Mann-Whitney test confirms this, even at a significance level of 1%. However, a
closer look at the recognition rates of particular emotions reveals that all but one
emotion category have at least one Photograph-Virtual head pair with comparable
results, demonstrating that recognition was as successful with the virtual head as it
was with the directly corresponding photographs. Figure 10 shows recognition
rates for these “top” virtual heads in each category. Disgust still stands out as a
category with "poor" results for the virtual head.
Figure 10: Summary of recognition rates for selected images
Results also indicate that recognition rates vary significantly between participants.
The lowest scoring individual recognised 30 out of 56 emotions correctly (54%),
the highest score was 48 (86%). Those who achieved better results did so
homogeneously between virtual heads and photographs. Lower scoring
participants were more likely to fail recognising virtual heads rather than the
photographs.
The expressions of emotion identified as being most distinctive are shown below.
Each expression is coded according to FACS with corresponding action units.
21
Some action units are binary, i.e. they are applied or not, while other action units
have an associated intensity scoring. Intensity can vary from A (weakest) to E
(strongest). The study results would recommend use of these particular
expressions, or "exemplars", for models with a similarly limited number of
animation control parameters:
Figure 11: Most distinctive expressions
Surprise (AUs 1C 2C 5C 26) is a very brief emotion, shown mostly around the
eyes. Our "exemplary surprise face" features high raised eyebrows and raised
upper lids. The lower eyelids remain in the relaxed position. The open mouth is
relaxed, not tense. Unlike the typical human surprise expression, the virtual head
does not actually drop the jaw bone. The evidence is that this does not have an
adverse effect however, considering that 80% of all participants classified the
expression correctly.
22
Fear (AUs 1B 5C L10A 15A 25) usually has a distinctive appearance in all three
areas of the face [28]. The variation which proved to be most successful in our
study is characterised by raised, slightly arched eyebrows. The eyes are wide open
as in surprise and the lips are parted and tense. This is in contrast to the open but
relaxed "surprise" mouth. There is an asymmetry in that the left upper lip is
slightly raised.
Disgust (AUs 4C 7C 10A) is typically shown in the mouth and nose area [28].
The variation with the best results is characterised mainly by the raised upper lip
(AU10) together with tightened eyelids. It has to be stressed that disgust was the
least successful category with only 30% of the participants assigning this
expression correctly.
Our Anger face (AUs 2A 4B 7C 17B) features lowered brows that are drawn
together. Accordingly, the eyelids are tightened which makes the eyes appear to
be staring out in a penetrating fashion. Lips are pressed firmly together with the
corners straight, a result of the chin raiser AU17.
Happiness (AUs 12C, 25) turned out to be easy to recognise - in most cases a
cheek raiser (AU12) is sufficient. In our exemplary face, the eyes are relaxed and
mouth corners are being pulled up. The virtual head does not allow change to the
cheek appearance, neither does it allow for wrinkles to appear underneath the
eyes. Such smiles without cheek or eye involvement are sometimes referred to as
non-enjoyment smiles, or "Duchénne smiles" after the 19
th
century French
neurologist Duchénne de Boulogne [cf. 31].
The Sadness expression (AUs 1D 4D 15A 25) that was most successful has
characteristic brow and eye features. The brows are raised in the middle while the
outer corners are lowered. This affects the eyes which are triangulated with the
inner corner of the upper lids raised. The slightly raised lower eyelid is not
necessarily typical [33] but, in this case, increases the sadness expression. The
corners of the lips are down.
23
6.1 Recognition errors
The errors made by participants when assigning expressions to categories are
presented in Table 2. The matrix shows which categories have been confused, and
compares virtual heads with photographs. Rows give per cent occurrence of each
response. Confusion values above 10% are shaded light grey, above 20% dark
grey, above 30% black:
Table 2: Error matrix for emotion categorisation
Response [ Virtual / Photograph ]
Other/
Category
Surprise Fear Disgust Anger Happiness
Sadness Neutral
Don’t know
Surprise
.67 .85 .06 .07 .00 .00 .00 .01 .23 .00 .00 .00 .01 .00 .03 .08
Fear
.15 .19 .41 .73 .00 .04 .30 .00 .03 .00 .03 .00 .02 .00 .06 .03
Disgust
.01 .02 .02 .00 .22 .77 .39 .14 .01 .00 .04 .00 .10 .01 .21 .07
Anger
.03 .04 .00 .04 .00 .03 .77 .72 .02 .00 .03 .03 .11 .05 .05 .09
Happiness
.01 .00 .01 .00 .01 .00 .01 .00 .64 .84 .03 .00 .26 .15 .04 .02
Sadness
.06 .00 .09 .10 .00 .00 .00 .01 .01 .01 .85 .66 .03 .09 .01 .07
Neutral
.03 .00 .03 .00 .01 .00 .00 .01 .00 .02 .11 .01 .78 .94 .04 .02
Disgust and Anger
Table 2 shows that the majority of confusion errors were made in the category
Disgust, an emotion frequently confused with Anger. When examining results for
virtual heads only, anger (39%) was picked almost twice as often as disgust
(22%). Further, with faces showing disgust, participants often felt unable to select
any given category and instead picked “Don’t know”, or suggested an alternative
emotion. These alternatives were for example aggressiveness, hatred, irritation,
or self-righteousness.
Ekman and Friesen [28] describe disgust (or contempt) as an emotion that often
carries an element of condescension toward the object of contempt. People feeling
disgusted by other people, or their behaviour, tend to feel morally superior to
them. Our observations confirm this tendency, for where “other” was selected
instead of the expected “disgust”, the suggested alternative was often in line with
the Ekman and Friesen's interpretation.
24
Fear and Surprise
The error matrix (Table 2) further reveals that Fear was often mistaken for
Surprise, a tendency that was also observed in several other studies (see [34]). It is
stated that a distinction between the two emotions can be observed with high
certainty only in "literate" cultures, but not in "pre-literate", visually isolated
cultures. Social psychology states that experience and therefore expression of fear
and surprise often happen simultaneously, such as when fear is felt suddenly due
to an unexpected threat [28]. The appearance of fear and surprise is also similar,
with fear generally producing a more tense facial expression. However, fear
differs from surprise in three ways:
1. Whilst surprise is not necessarily pleasant or unpleasant, even mild fear is
unpleasant.
2. One can be afraid of something familiar that is certainly going to happen
(for example a visit to the dentist), whereas something familiar or expected
can hardly be surprising.
3. Whilst surprise usually disappears as soon as it is clear what the surprising
event was, fear can last much longer, even when the nature of the event is
fully known.
These indicators allow differentiating whether a person is afraid or surprised. All
three have to do with context and timing of the fear-inspiring event – factors that
are not perceivable from a still image.
In accordance with this, Poggi and Pélachaud [56] found that emotional
information is not only contained in the facial expression itself, but also in the
performatives of a communicative act: suggesting, warning, ordering, imploring,
approving and praising. Similarly, Bartneck [49] observed significantly higher
recognition rates when still images of facial expressions were shown in a dice
game context, compared to display without any context. In other words, the
meaning and interpretation of an emotional expression can depend on the situation
in which it is shown. This strongly suggests that in situations where the facial
expression is animated or displayed in context, recognition rates can be expected
to be higher.
25
Fear and Anger
The relationship between Fear and Anger is similar to that between fear and
surprise. Both can occur simultaneously, and their appearance often blends. What
is striking is that all confusions were made with virtual faces, whilst not even one
of the fear photographs was categorised as anger. This may suggest that the fear
category contained some relatively unsuitable examples of modelled facial
expressions. An examination of the results shows that there was one artefact in
particular that was regularly mistaken for anger.
Figure 12 shows an expression with the appearance of the eyes being
characteristic of fear. The lower eyelid is visibly drawn up and appears to be very
tensed. Both eyebrows are slightly raised and drawn together. The lower area of
the face also shows clear characteristics of fear, such as the slightly opened mouth
with stretched lips that are drawn together. In contrast, an angry mouth has the
lips either pressed firmly together or open in a “squarish” shape, as if to shout.
Figure 12: Fear expression, variation A
However, 18 out of 29 times this expression was categorised as anger. In anger, as
in fear, eyebrows can be drawn together. But unlike the fearful face which shows
raised eyebrows, the angry face features a lowered brow. Generally, we have
found that subtle changes to upper eyelids and brows had a significant effect on
26
the expression overall, which is in line with findings for real life photographs
[28].
The eyebrows in Figure 12 are only slightly raised from the relaxed position, but
perhaps not enough to give the desired impression. Another confusing indicator is
the furrowed shape of the eyebrows, since a straight line or arched brows are more
typical for fear.
Figure 13: Fear expression, variation B
In contrast, in Figure 13 the expression is identical to the expression in Figure 12
apart from the eyebrows, which are now raised and arched, thereby changing the
facial expression significantly and making it less ambiguous and distinctively
“fearful”.
6.2 Post-experiment questionnaire results
After completing the recognition task participants were asked to complete a
questionnaire and were invited to comment on any aspect of the experiment.
Responses to the latter are discussed in the next section of this paper. The
questionnaire comprised eleven questions, each one answered on a scale from 0-4
with 0 being total disagreement and 4 being total agreement. Table 3 below shows
average values per question.
27
Table 3: Post-experiments questionnaire results
No.
Statement
Score
(0=disagree, 4=agree)
1. The interface was easy to use.
3.8
2. More emotion categories would have been better.
2.3
3. Emotions were easy to recognise.
2.2
4. The real people showed natural emotions.
2.6
5. I responded emotionally to the pictures.
2.0
6. It was difficult to find the right category.
1.9
7. The "recognisability" of the emotions varied a lot.
2.5
8. The real-life photographs looked posed.
2.2
9. The choice of emotions was sufficient.
2.9
10. Virtual faces showed easily recognisable emotions.
2.7
11. The virtual head looked alienating.
2.9
7. Discussion and Conclusions
The experiment followed standard practice for expression recognition experiments
by preparing the six universal emotions as pictures of avatar faces of photographs
of real human faces and showing these to participants who are asked to say what
emotion they think each photograph or picture portrays [57]. Photographs were
selected from the databank "Pictures of Facial Affect" solely based on their high
recognition rates. This was believed to be the most appropriate method, aiming to
avoid the introduction of factors that would potentially disturb results, such as
gender, age or ethnicity.
Further, the photographs are considered standardised facial expressions of
emotions and exact AU coding is available for them. This ensures concurrent
validity, since performance in one test (virtual head) is related to another, well
reputed test (FACS coding and recognition ). Potential order effects induced by
the study's repeat measures design were neutralised by presenting the artefacts of
the two conditions in a mixed random order.
28
Further confidence in the results derives from the fact that participants found the
interface easy to use (table 3 statement 1), implying that results were not distorted
by extraneous user interface factors. Similarly, although participants tended to feel
that the photographs looked posed (table 3 statement 8), they nevertheless tended
to see them as showing real emotion (table 3 statement 4). Again, despite some
ambivalence in the matter (table 3 statements 2 and 9), participants were on the
whole happy with the number of categories of emotion offered in the experiment.
This is not unexpected since the facial expressions were showing merely the
offered range of emotions, and it supports the validity of our results. However, the
slight agreement indicates more categories could potentially have produced more
satisfaction in participants when making their choice. Two participants noted in
their comments, explicitly, that they would have preferred a wider choice of
categories.
Having established the validity of the experimental procedure and results, an
important conclusion to be drawn is that the approach of applying the reduced
FACS model to virtual face representations is not guaranteed to work for all
expressions, or all variations of a particular emotion category. This is implied by
the finding that recognition rates for the photographs were significantly higher
than those for the virtual heads (section 6.1). Further evidence is supplied in the
post-experiment questionnaire data. Two participants, for example, noted that on
several occasions the virtual face expression was not distinctive enough, and two
that the virtual head showed no lines or wrinkles and that recognition might have
been easier with these visual cues.
Nevertheless, our data also suggests that, when applying the FACS model to
virtual face representations, emotions can effectively be visualised with a very
limited number of facial features and action units. For example, in respect of the
“top scoring” virtual heads, emotion recognition rates are, with the exception of
the “disgust” emotion, comparable to those of their corresponding real-life
photographs. These top-scoring expressions are exemplar models for which
detailed AU scoring is available. They therefore potentially build a basis for
emotionally expressive avatars in collaborative virtual environments and hence
for the advantages of emotionally enriched CVEs argued for earlier.
29
No categorisation system can ever be complete. Although accepted categories
exist, emotions can vary in intensity and inevitably there is a subjective element to
recognition. When modelling and animating facial features, however, our results
suggest that such ambiguity in interpretation can be minimised by focussing on,
and emphasising, those visual clues that are particularly distinctive. Although it
remains to be corroborated through further studies, it is believed that such simple,
pure emotional expressions could fulfil a useful role in displaying explicit,
intended communicative acts which can therefore help interaction in a CVE. They
can provide a basis for emotionally enriched CVEs, and hence for the benefits of
such technology being used, for example, within distance learning as argued for
earlier.
It should perhaps be noted, however, that such pure forms of emotion are not
generally seen in real life, as many expressions occurring in face-to-face
communication between humans are unintended or automatic reactions. They are
often caused by a complex interaction of several simultaneous emotions, vividly
illustrated in Picard's example of a Marathon runner who, after winning a race,
experiences a range of emotions: “tremendously happy for winning the race,
surprised because she believed she would not win, sad that the race was over, and
a bit fearful because during the race she had acute abdominal pain” [21].
With regards to our own work, such instinctive reactions could be captured and
used to control an avatar directly, potentially allowing varying intensities and
blends of facial expressions to be recognised and modelled onto avatar faces.
However, this study has deliberately opted for an avatar that can express clearly,
and unambiguously, what the controlling individual exactly wants it to express,
since this is one way in which people may want to use CVE technology.
Another issue concerns consistency. Social psychology suggests, as do our own
findings, that an emotion’s recognisability depends on how consistently it is
shown on a face. Further, most emotions, with the exception of sadness, become
clearer and more distinctive when their intensity increases. There are indications
that in cases where the emotion appeared to be ambiguous at first, the photographs
30
contained subtle clues as to what emotion is displayed, enabling the viewer to
assign it after closer inspection. These clues appear to be missing in the virtual
head artefacts, suggesting the need to either emphasise distinctive and
unambiguous features, or to enhance the model by adding visual cues that help
identify variations of emotion more clearly. For further work on emotions in real-
time virtual environment interactions the authors aim to concentrate on the
former.
Overall, it should be noted that many of the artefacts classified by participants as
the “Other…” choice are actually close to the emotion category expected,
confirming that the facial expressions in those cases were not necessarily badly
depicted. This highlights the importance of having a well-defined vocabulary
when investigating emotions - a problem that is not new to the research
community and that has been discussed at length over the years (see [33] for an
early comparison of emotion dimensions vs. categories, also [32,58])
The experimental work discussed in this paper provides strong evidence that
creating avatar representations based on the FACS model, but using only a limited
number of facial features, allows emotions to be effectively conveyed, giving rise
to recognition rates that are comparable with those of the corresponding real-life
photographs. Effectiveness has been demonstrated through good recognition rates
for all but one of the emotion categories, and efficiency has been established since
a reduced feature set was found to be sufficient to build a successfully recognised
core set of avatar facial expressions.
In consequence, the top-scoring expressions illustrated earlier may be taken to
provide a sound basis for building emotionally expressive avatars to represent
users (which may in fact be agents), in CVEs. When modelling and animating
facial features, potential ambiguity in interpretation can be minimised by
focussing on, and emphasising particularly distinctive visual clues of a particular
emotion. We have proposed a set of expressions that fulfil this. These are not
necessarily the most distinctive clues for a particular emotion as a whole, but
those that we found to be very distinctive for that emotion category.
31
8. Further work
It is planned to extend the work in a variety of ways. The data reveals that certain
emotions were confused more often than others, most notably Disgust and Anger.
This was particularly the case for the virtual head expressions. Markham and
Wang [59] observed a similar link between these two emotions when showing
photographs of faces to children. Younger children (aged 4-6) in particular tended
to group certain emotions together, while older children (aged 10+) were found to
typically have the ability to differentiate correctly. In view of the findings from
the current study, this may indicate that although adults can differentiate emotions
well in day-to-day social interaction, the limited clues provided by the virtual
head make observers revert back to a less experience-based, but more instinct-
based manner when categorising them. However, more work will be necessary to
investigate this possibility.
Two other studies also found Disgust often confused with Anger [49,60] and
concluded that the lack of morph targets, or visual clues, around the nose was a
likely cause. In humans, Disgust is typically shown around the mouth and nose
[28] and although our model features a slightly raised lip (AU10), there is no
movement of the nose. This strongly suggests that to improve distinctiveness of
the Disgust expression in a real-time animated model, the nose should be included
in the animation, as should the relevant action unit AU9 which is responsible for
“nose wrinkling”. Given this, we have now developed an animated model of the
virtual head that is capable of lifting and wrinkling the nose to express Disgust.
The experimental results, in particular the relatively high number of “Other” and
“Don’t know” responses, indicate that limiting the number of categories of
emotion might have had a negative effect on the recognition success rates. It
might be that allowing more categories, and/or offering a range of suitable
descriptions for an emotion category (such as Joy, Cheerfulness and Delight, to
complement Happiness), would yield still higher recognition rates, and future
experiments will address this.
Similarly, although concentrating on the face as the primary channel for
conveying emotions, the work must be seen in a wider context in which the entire
32
humanoid representation of a user can in principle act as the communication
device in CVEs. The experiments discussed here set the foundation for further
work on emotional postures and the expression of attitude through such a virtual
embodiment, drawing for example on the work of [61] on posture, [62] on
gestures, or [4] on spatial behaviour and gestures.
A further contextual aspect of emotional recognition concerns the conversational
milieu within which emotions are expressed and recognised. Context plays a
crucial role in emotion expression and recognition - effective, accurate mediation
of emotion is closely linked with the situation and other, related, communicative
signals. A reliable interpretation of facial expressions, which fails to take
cognisance of the context in which they are displayed, is often not possible. One
would expect, therefore, that recognition of avatar representations of emotion will
be higher when contextualised. This assumption requires empirical investigation,
however, and future experiments are planned to address this.
Bartneck [49] distinguishes between the recognisability of a facial expression of
emotion, and its "convincingness", seeing the latter as more important, and further
experimental work will enable study of how this distinction plays itself out in a
virtual world. It is predicted that timing will affect "convincingness" in a virtual
world. For example, showing surprise over a period of, say, a minute would - at
the very least - send confusing or contradictory signals. It will also be possible to
investigate this and, more generally, what impact the mediation of emotions has
on the conversational interchanges.
A further contextual issue concerns culture. Although emotions exist universally,
there can be cultural differences concerning when emotions are displayed [32]. It
appears that people in various cultures differ in what they have been taught about
managing or controlling their facial expression of emotion. Ekman and Friesen
[28] call these cultural norms “display rules”. Display rules prescribe whether,
and if so when, an emotion is supposed to be fully expressed, masked, lowered or
intensified. For instance, it has been observed that male Japanese are often
reluctant to show unpleasant emotions in the physical presence of others.
Interestingly, these cultural differences can also affect the recognition of
33
emotions. In particular, Japanese people reportedly have more difficulty than
others recognising negative expressions of emotions, an effect that may reflect a
lack of perceptual experience with such expression because of the cultural
proscriptions against displaying them [32]. How such cultural differences might
play themselves out in a virtual world is an important open question.
Finally, the authors wish to explore how the results concerning the mediation of
emotions via avatars might be beneficially used to help people with autism. A
commonly, if not universally, held view of the nature of autism is that it involves
a “triad of impairments” [63]. There is a social impairment: the person with
autism finds it hard to relate to, and empathise with, other people. Secondly, there
is a communication impairment: the person with autism finds it hard to understand
and use verbal and non-verbal communication. Finally, there is a tendency to
rigidity and inflexibility in thinking, language and behaviour. Much current
thinking is that this triad is underpinned by a “theory of mind deficit” - people
with autism may have a difficulty in understanding mental states and in ascribing
them to themselves or to others.
CVE technology of the sort discussed in this paper could potentially provide a
means by which people with autism might communicate with others (autistic or
non-autistic) and thus circumvent their social and communication impairment and
sense of isolation. Further, as well as this prosthetic role, the technology can also
be used for purposes of practice and rehearsal. For this to help combat any theory
of mind problem, users would need to be able to recognise the emotions being
displayed via the avatars. The findings reported in the current paper give grounds
for confidence that the technology will be useful in such a role, but this needs to
be investigated in practice [cf. 64,65].
Much remains to be investigated, therefore, concerning the educational use of the
emerging CVE technology. It is hoped that the work reported in this paper will
help set the foundation for further work on the mediation of emotions in virtual
worlds.
Acknowledgements Photographs from the CD-Rom Pictures of Facial Affect
[35] used with permission. Original virtual head geometry by Geometrek.
34
Detailed results of this study as well as the virtual head prototypes are available
online at http://www.leedsmet.ac.uk/ies/comp/staff/mfabri/emotion
References
1. Thalmann D. The Role of Virtual Humans in Virtual Environment Technology and Interfaces. In:
Frontiers of Human-Centred Computing, Online Communities and Virtual Environments. Earnshaw R,
Guedj R, Vince J eds. London: Springer Verlag. 2001
2. Fleming B, Dobbs D. Animating Facial Features and Expressions. Charles River Media: Boston 1999
3. Dumas C, Saugis G, Chaillou C, Degrande S, Viaud M. A 3-D Interface for Cooperative Work. In:
Collaborative Virtual Environments 1998 Proceedings. Manchester. 1998
4. Manninen T, Kujanpää T. Non-Verbal Communication Forms in Multi-player Game Sessions. In:
People and Computers XVI – Memorable Yet Invisible. Faulkner X, Finlay J, Détienne F eds. London:
BCS Press. ISBN 1852336595. 2002
5. Atkins H, Moore D, Hobbs D, Sharpe S. Learning Style Theory and Computer Mediated
Communication. In: ED-Media 2001 Proceedings. 2001
6. Laurillard D. Rethinking University Teaching. Routledge:London 1993
7. Moore M. Three Types of Interaction. In: Distance Education: New Perspectives. Harry K, John M,
Keegan D eds. London: Routledge. 1993
8. Johnson D, Johnson R. Cooperative learning in the culturally diverse classroom. In: Cultural Diversity in
Schools. DeVillar, Faltis, Cummins eds. Albany: State University of New York Press. 1994
9. Webb N. Constructive Activity and Learning in Collaborative Small Groups. Educational Psychology
1995; 87(3); 406-423
10. Wu A, Farrell R, Singley M. Scaffolding Group Learning in a Collaborative Networked Environment.
In: CSCL 2002 Proceedings. Boulder, Colorado. 2002
11. Lisetti C, Douglas M, LeRouge C. Intelligent Affective Interfaces: A User-Modeling Approach for
Telemedicine. In: Proceedings of International Conference on Universal Access in HCI. New Orleans,
LA. Elsevier Science Publishers. 2002
12. Daly-Jones O, Monk A, Watts L. Some advantages of video conferencing over high-quality audio
conferencing: fluency and awareness of attentional focus. Int. Journal of Human-Computer Studies
1998; 49(1); 21-58
13. McShea J, Jennings S, McShea H. Characterising User Control of Video Conferencing in Distance
Education. In: CAL-97 Proceedings. Exeter University. 1997
14. Fabri M, Gerhard M. The Virtual Student: User Embodiment in Virtual Learning Environments. In:
International Perspectives on Tele-Education and Virtual Learning Environments. Orange G, Hobbs D
eds. Ashgate 2000
15. Knapp M. Nonverbal Communication in Human Interaction. Holt Rinehart Winston: New York 1978
16. Morris D, Collett P, Marsh P, O'Shaughnessy M. Gestures, their Origin and Distribution. Jonathan Cape:
London 1979
17. Argyle M. Bodily Communication (second edition). Methuen: New York 1988
18. Strongman K. The Psychology of Emotion (fourth edition). Wiley & Sons: New York 1996
19. Dittrich W, Troscianko T, Lea S, Morgan D. Perception of emotion from dynamic point-light displays
presented in dance. Perception 1996; 25; 727-738
20. Keltner D. Signs of appeasement: evidence for the distinct displays of embarrassment, amusement and
shame. Personality and Social Psychology 1995; 68(3); 441-454
35
21. Picard R. Affective Computing. MIT Press 1997
22. Lisetti C, Schiano D. Facial Expression Recognition: Where Human-Computer Interaction, Artificial
Intelligence and Cognitive Science Intersect. Pragmatics and Cognition 2000; 8(1); 185-235
23. Damásio A. Descarte’s Error: Emotion, Reason and the Human Brain. Avon: New York 1994
24. Cooper B, Brna P, Martins A. Effective Affective in Intelligent Systems – Building on Evidence of
Empathy in Teaching and Learning. In: Affective Interactions: Towards a New Generation of Computer
Interfaces. Paiva A ed. London: Springer Verlag. 2000
25. Johnson W. Pedagogical Agents. In: Computers in Education Proceedings. Beijing, China. 1998
26. McGrath A, Prinz W. All that Is Solid Melts Into Software. In: Collaborative Virtual Environments -
Digital Places and Spaces for Interaction. Churchill, Snowdon, Munro eds. London: Springer. 2001
27. Durlach N, Slater M. Meeting People Virtually: Experiments in Shared Virtual Environments. In: The
Social Life of Avatars. Schroeder R ed, London: Springer Verlag. 2002
28. Ekman P, Friesen W. Unmasking the Face. Prentice Hall: New Jersey 1975
29. New Oxford Dictionary of English. Oxford University Press 2001
30. Russell J, Férnandez-Dols J. The Psychology of Facial Expression, Cambridge University Press 1997
31. Surakka V, Hietanen J. Facial and emotional reactions to Duchénne and non-Duchénne smiles.
International Journal of Psychophysiology 1998; 29(1); 23-33
32. Zebrowitz L. Reading Faces: Window to the Soul? Westview Press: Boulder, Colorado 1997
33. Ekman P, Friesen W, Ellsworth P. Emotion in the Human Face: Guidelines for Research and an
Integration of Findings. Pergamon Press: New York 1972
34. Ekman P. Facial Expressions. In: Handbook of Cognition and Emotion. Dalgleish T, Power M eds. New
York: Wiley & Sons. 1999
35. Ekman P, Friesen W. Pictures of Facial Affect CD-Rom. University of California, San Francisco. 1975
36. Ekman P, Friesen W. Facial Action Coding System. Consulting Psychologists Press 1978
37. Bartlett M. Face Image Analysis by Unsupervised Learning and Redundancy Reduction (Ph.D. Thesis).
University of California, San Diego. 1998
38. Pélachaud C, Badler N, Steedman M. Generating Facial Expressions for Speech. Cognitive Science
1996; 20(1); 1-46
39. Ekman P, Rosenzweig L eds. What the Face Reveals: Basic and Applied Studies of Spontaneous
Expression Using the Facial Action Coding System. Oxford University Press. 1998
40. Terzopoulos D, Waters K. Analysis and synthesis of facial image sequences using physical and
anatomical models. Pattern Analysis and Machine Intelligence 1993; 15(6); 569-579
41. Platt S, Badler N. Animating facial expression. ACM SIGGRAPH 1981; 15(3); 245-252
42. Parke F. Parameterized modeling for facial animation. IEEE Computer Graphics and Applications 1982;
2(9); 61-68
43. Benford S, Bowers J, Fahlén L, Greenhalgh C, Snowdon D. User Embodiment in Collaborative Virtual
Environments. In: CHI 1995 Proceedings. Denver, Colorado: ACM Press 1995
44. Mori M. The Buddha in the Robot. Tuttle Publishing 1982
45. Reichardt J. Robots: Fact, Fiction and Prediction. London: Thames & Hudson 1978
46. Hindmarsh J, Fraser M, Heath C, Benford S. Virtually Missing the Point: Configuring CVEs for Object-
Focused Interaction. Collaborative Virtual Environments: Digital Places and Spaces for Interaction.
Churchill E, Snowdon D, Munro A eds. London: Springer Verlag. 2001
47. Godenschweger F, Strothotte T, Wagener H. Rendering Gestures as Line Drawings. Gesture Workshop
1997. Bielefeld, Germany. Springer Verlag 1997
48. Donath J. Mediated Faces. Cognitive Technology: Instruments of Mind. Beynon M, Nehaniv C,
Dautenhahn K eds. Warwick, UK. 2001
36
49. Bartneck C. Affective Expressions of Machines. In: CHI 2001 Proceedings. Seattle, USA. 2001
50. Ellis H. Developmental trends in face recognition. The Psychologist: Bulletin of the British
Psychological Society 1990; 3; 114-119
51. Dittrich W. Facial motion and the recognition of emotions. Psychologische Beiträge 1991; 33(3/4); 366-
377
52. H-Anim Working Group. Specification for a Standard VRML Humanoid. http://www.h-anim.org
53. Yacoob Y, Davis L. Computing spatio-temporal representations of human faces. In: Computer Vision
and Pattern Recognition Proceedings. IEEE Computer Society 1994
54. Essa I, Pentland A. Coding, Analysis, Interpretation, and Recognition of Facial Expressions. In: IEEE
Transactions on Pattern Analysis and Machine Intelligence. 1995
55. Neisser U. Cognition and Reality. Freeman: San Francisco 1976
56. Poggi I, Pélachaud C. Emotional Meaning and Expression in Animated Faces. In: Affective Interactions:
Towards a New Generation of Computer Interfaces. Paiva A ed. London: Springer Verlag. 2000
57. Rutter D. Non-verbal communication. In: The Blackwell Dictionary of Cognitive Psychology. Eysenck
M ed. Blackwell Publishers: Oxford. 1990
58. Wehrle T, Kaiser S. Emotion and Facial Expression. In: Affective Interactions: Towards a New
Generation of Computer Interfaces. Paiva A ed. London: Springer Verlag. 2000
59. Markham R, Wang L. Recognition of emotion by Chinese and Australian children. Cross-Cultural
Psychology 1996; 27(5); 616-643
60. Spencer-Smith J, Innes-Ker A, Wild H, Townsend J. Making Faces with Action Unit Morph Targets.
AISB'02 Symposium on Animating Expressive Characters for Social Interactions. ISBN 1902956256.
London. 2002
61. Coulson M. Expressing emotion through body movement: A component process approach. In: Artificial
Intelligence and Simulated Behaviour Proceedings. Imperial College, London. 2002
62. Capin T, Pandzic I, Thalmann N, Thalmann D. Realistic Avatars and Autonomous Virtual Humans in
VLNET Networked Virtual Environments. In: Virtual Worlds on the Internet. Earnshaw R, Vince J eds.
IEEE Computer Science Press. 1999
63. Wing L. Autism Spectrum Disorders. Constable 1996
64. Moore D, McGrath P, Thorpe J. Computer Aided Learning for People with Autism – A Framework for
Research and Development. Innovations in Education and Training International 2000; 37(3); 218-228
65. Moore D, Taylor J. Interactive multimedia systems for people with autism. Educational Media 2001;
25(3); 169-177
... Participants can express the following universal emotions by clicking the relevant emoticon on the interface: happiness, sadness, surprise, anger, fear, and disgust [5]. Previous work has shown that these can effectively be visualised using a relatively simple animated 3D head model [6]. Indeed, whilst muscle activity in the human face is usually coded using 58 Action Units [5], we were able to reduce this to 12 Action Units for the virtual head model whilst maintaining high distinctiveness and recognisability of facial expressions. ...
Article
Full-text available
Instant Messaging (IM) tools such as Microsoft® MSN Messenger or Yahoo!® Messenger enable people to communicate in real-time, via text-chat, over a distance. Typically emotional icons, or emoticons, can be embedded into messages as intentional communicative signals. Recent IM versions also allow display of a picture representing oneself, and in turn one may see a picture of the interlocutor. This paper applies the Instant Messaging paradigm to communication in collaborative virtual environments (CVE). The IM interface is embedded into virtual space and three-dimensional, animated avatars represent the interlocutors. In addition to simply seeing each other, users can control their avatar's facial expressions via emoticons. We outline the design considerations for such a messaging tool and investigate whether and how the introduction of emotionally expressive characters enriches the individual's experience. We define what we mean with richness of experience and present a methodological framework for evaluation.
... A choice of six avatar heads was available, each capable of displaying the "universal" facial expressions of emotion happiness, surprise, anger, fear, sadness and disgust [12] and a neutral face. Expressions were designed to be highly distinctive and recognizable [14]. All characters were based on identical animation sequences to ensure consistence and validity. ...
Conference Paper
Full-text available
We present our work on emotionally expressive avatars, animated virtual characters that can express emotions via facial expressions. Because these avatars are highly distinctive and easily recognizable, they may be used in a range of applications. In the first part of the paper we present their use in computer mediated communication where two or more people meet in virtual space, each represented by an avatar. Study results suggest that social interaction behavior from the real-world is readily transferred to the virtual world. Empathy is identified as a key component for creating a more enjoyable experience and greater harmony between users. In the second part of the paper we discuss the use of avatars as an assistive, educational and therapeutic technology for people with autism. Based on the results of a preliminary study, we provide pointers regarding how people with autism may overcome some of the limitations that characterize their condition.
Article
Full-text available
We discuss here one of our projects, aimed at developing an automatic facial expression interpreter, mainly in terms of signaled emotions. We present some of the relevant findings on facial expressions from cognitive science and psychology that can be understood by and be useful to researchers in Human-Computer Interaction and Artificial Intelligence. We then give an overview of HCI applications involving automated facial expression recognition, we survey some of the latest progresses in this area reached by various approaches in computer vision, and we describe the design of our facial expression recognizer. We also give some background knowledge about our motivation for understanding facial expressions and we propose an architecture for a multimodal intelligent interface capable of recognizing and adapting to computer users’ affective states. Finally, we discuss current interdisciplinary issues and research questions which will need to be addressed for further progress to be made in the promising area of computational facial expression recognition.
Chapter
All verbal communication is affected by the non-verbal communication that accompanies it. On the telephone the tone of voice conveys nuances of meaning. Face to face, expression, gestures and posture also play an important part. We use demonstrations and models to supplement words, visual aids to clarify lectures, and maps, diagrams, charts and graphs (see Chapter 14) enhance both spoken and written communication.
Article
While we have known for centuries that facial expressions can reveal what people are thinking and feeling, it is only recently that the face has been studied scientifically for what it can tell us about internal states, social behavior, and psychopathology. Today's widely available, sophisticated measuring systems have allowed us to conduct a wealth of new research on facial behavior that has contributed enormously to our understanding of the relationship between facial expression and human psychology. The chapters in this volume present the state-of-the-art in this research. They address key topics and questions, such as the dynamic and morphological differences between voluntary and involuntary expressions, the relationship between what people show on their faces and what they say they feel, whether it is possible to use facial behavior to draw distinctions among psychiatric populations, and how far research on automating facial measurement has progressed. © 1997, 2005 by Oxford University Press, Inc. All rights reserved.
Conference Paper
The BBC TV series Dr Who popularized the race of beings known as the "Daleks" [l]. A Dalek is a creature that is completely encased in a metallic shell, through which it can slide over the ground only over certain types of flat, smooth and electrically conducting surfaces. It has three limbs, one used as an eye-piece delivering a relatively small field of view, another which is a weapon, and a third which acts as an end-effector for the manipulation of objects. There is no direct evidence about Daleks’ auditory capabilities. However, this is unlikely to be good, since Daleks have a habit of repeating most things they say several times. Daleks tend to shout rather than talk, another indication of poor auditory capability and a lossy information channel. This evolutionary development was the result of a warinduced nuclear holocaust, thousands of years in the past.