Conference PaperPDF Available

Affect bursts to constrain the meaning of the facial expressions of the humanoid robot Zeno

Authors:

Abstract and Figures

When a robot is used in an intervention for autistic children to learn emotional skills, it is particularly important that the robot's facial expressions of emotion are well recognised. However, recognising what emotion a robot is expressing, based solely on the robot's facial expressions, can be difficult. To improve the recognition rates, we added affect bursts to a set of caricatured and more humanlike facial expressions , using Robokind's R25 Zeno robot. Twenty-eight typically developing children participated in this study. We found no significant difference between the two sets of facial expressions. However, the addition of affect bursts significantly improved the recognition rates of the emotions by helping to constrain the meaning of facial expression.
Content may be subject to copyright.
Affect bursts to constrain the meaning of the
facial expressions of the humanoid robot Zeno
Bob R. Schadenberg, Dirk K. J. Heylen, and Vanessa Evers
Human-Media Interaction, University of Twente, Enschede, the Netherlands
{b.r.schadenberg, d.k.j.heylen, v.evers}@utwente.nl
Abstract. When a robot is used in an intervention for autistic children
to learn emotional skills, it is particularly important that the robot’s
facial expressions of emotion are well recognised. However, recognising
what emotion a robot is expressing, based solely on the robot’s facial
expressions, can be difficult. To improve the recognition rates, we added
affect bursts to a set of caricatured and more humanlike facial expres-
sions, using Robokind’s R25 Zeno robot. Twenty-eight typically develop-
ing children participated in this study. We found no significant difference
between the two sets of facial expressions. However, the addition of af-
fect bursts significantly improved the recognition rates of the emotions
by helping to constrain the meaning of facial expression.
Keywords: emotion recognition, affect bursts, facial expressions, hu-
manoid robot.
1 Introduction
The ability to recognise emotions is impaired in individuals with Autism Spec-
trum Condition [1], a neurodevelopmental condition characterised by difficulties
in social communication and interaction, and behaviour rigidity [2]. Recognising
emotions is central to success in social interaction [3], and due to impairment in
this skill, autistic individuals often fail to accurately interpret the dynamics of
social interaction. Learning to recognise the emotions of others may provide a
toehold for the development of more advanced emotion skills [4], and ultimately
improve social competence.
In the DE-ENIGMA project, we aim to develop a novel intervention for teach-
ing emotion recognition to autistic children with the help of a humanoid robot –
Robokind’s R25 model called Zeno. The intervention is targeted at autistic chil-
dren who do not recognise facial expressions, and who may first need to learn to
pay attention to faces and recognise the facial features. Many of these children
will have limited receptive language, and may have lower cognitive ability. The
use of a social robot in an intervention for autistic children is believed to improve
the interest of the children in the intervention and provide them with a more
understandable environment [5].
The emotions that can be modelled with Zeno’s expressive face can be dif-
ficult to recognise, even by typically developing individuals [6–8]. This can be
partly attributed to the limited degree’s of freedom of Zeno’s expressive face,
resulting in emotional facial expressions that may not be legible, but more im-
portantly because facial expressions are inherently ambiguous when they are not
embedded in a situational context [9]. Depending on the situational context, the
same facial expression can signal different emotions [10]. However, typically de-
veloping children start using the situational cues to interpret facial expression
consistently around the age of 8 or 9 [11]. Developmentally, the ability to use the
situational context is an advanced step in emotion recognition, whereas many
autistic children still need to learn the basic steps of emotion recognition. To this
end, we require a developmentally appropriate manner to constrain the mean-
ing of Zeno’s facial expressions during the initial steps of learning to recognise
emotions.
In the study reported in this paper, we investigate whether a multimodal
emotional expressions lead to a better recognition rates by typically developing
children than unimodal facial expressions. We tested two sets of facial expres-
sions, with and without non-verbal vocal expressions of emotion. One set of
facial expressions was designed by Salvador, Silver, and Mahoor [7], while the
other set is Zeno’s default facial expressions provided by Robokind. The lat-
ter are caricatures of human facial expressions, which we expect will be easier
to recognise than the more realistic humanlike facial expressions of Salvador et
al. [7]. Furthermore, we expect that the addition of non-verbal vocal expressions
of emotion will constrain the meaning of the facial expressions, making them
easier to recognise.
2 Related Work
Typically developing infants initially learn to discriminate the affect of another
through multimodal stimulation [12], which is one of the first steps in emo-
tion recognition. Discriminating between affective expressions through unimodal
stimulation develops afterwards. Multimodal stimulation is believed to be more
salient to young infants and therefore more easily draws their attention. In the
design of legible robotic facial expressions, mulitmodal expressions are often used
to improve recognition rates. Costa et al. [6] and Salvador et al. [7] added emo-
tional gestures to constrain the meaning of the facial expressions of the Zeno
R50 model, which has a face similar to the Zeno R25 model, and validated them
with typically developing individuals. The emotions joy, sadness, and surprise
seem to be well recognised by typically developing individuals, with recognition
rates of over 75%. However, the emotions anger, fear, and disgust were more
difficult to recognise with recognition rates ranging from 45% to the point of
guessing (17%). While the recognition rates improved by the addition of ges-
tures for Costa et al. [6], they showed a mixed result for Salvador et al. [7],
where the emotional gestures improved the recognition of some emotions and
decreased the recognition in others.
The ability of emotional gestures to help constrain the meaning of facial
expressions of emotions is dependent on the body design of the robot. Whereas
the Zeno R50 model can make bodily gestures that resemble humanlike gestures
fairly well, the Zeno R25 model is very limited in its bodily capabilities due to
the limited degrees of freedom in its body and the joints rotate differently from
human joints. This makes it particularly difficult to design body postures or
gestures that match humanlike expressions of emotion.
In addition to expressing emotions through facial expressions, bodily pos-
tures, or gestures, emotions are also expressed using vocal expressions [13]. In
human-human interaction, these vocal expressions of emotions can constrain the
meaning of facial expressions [14]. A specific type of vocal expressions of emotions
are affect bursts, which are defined as “short, emotional non-speech expressions,
comprising both clear non-speech sounds (e.g. laughter) and interjections with
a phonemic structure (e.g. “Wow!”), but excluding “verbal” interjections that
can occur as a different part of speech (like “Heaven!”, “No!”, etc.)” [15, p. 103].
When presented in isolation, affect bursts can be an effective means of conveying
an emotion [15, 16].
3 Design Implementation
3.1 Facial expressions
In this study, we used Robokind’s R25 model of the child-like robot Zeno. The
main feature of this robot is its expressive face, which can be used to model
emotions. It has five degrees of freedom in its face, and two in its neck.
For the facial expressions (see figure 1), we used Zeno’s default facial expres-
sions provided by Robokind, and the facial expressions developed by Salvador et
al. [7], which we will refer to as the Denver facial expressions. The Denver facial
expressions have been modelled after the facial muscle movements underlying
human facial expressions of emotions, as defined by the Facial Action Coding
System [17], and contain the emotions joy, sadness, fear, anger, surprise, and
disgust. Although the Denver facial expressions have been designed for the Zeno
R50 model, the R25 has a similar face. Thus we did not have to alter the facial
expressions.
Zeno’s default facial expressions include joy, sadness, fear, anger, and sur-
prise, but not disgust. Compared to the Denver facial expressions, the default
facial expressions are caricatures of human facial expressions of emotion. Ad-
ditionally, the default expressions for fear and surprise also include a temporal
dimension. For fear, the eyes move back and forth from one side to the other,
and surprise contains eye blinks.
Both the Denver and default facial expressions last 4 seconds including a
ramp-up of 0.5 seconds and returning back to the neutral emotion in 0.5 seconds.
This leaves the participants with enough time to look at and interpret the facial
expression.
Fig. 1. Zeno’s default and Denver facial expressions for joy, sadness, anger, fear, sur-
prise, and disgust. The default facial expressions do not cover disgust.
3.2 Affect bursts
The affect bursts1were expressed by an adult Dutch-speaking female actor. After
the initial briefing, the Denver facial expressions were shown to the actor to make
it easier for the actor to act being the robot. Furthermore, showing the facial
expressions provided the actor with the constraints posed by the expressions.
After each facial expression, the actor would express an affect burst that matches
the emotion and Zeno’s facial expression. The affect bursts were recorded using
the on-board microphone of a MacBook Pro Retina laptop and last 0.7 to 1.3
seconds. To improve the audio quality, the affect bursts were played through a
Philips BT2500 speaker placed on Zeno’s back.
1https://goo.gl/ztbMxw
4 Methodology
4.1 Participants
The study took place during a school trip to the University of Twente where the
participants could freely choose in which of several experiments to participate.
The study took place in a large open room where each experiment was separated
by a room divider on two sides. Of the children who joined the school trip, 28
typically developing children (19 female, and 9 male) between the ages 9 and 12
(M= 10.1, SD = 0.9) participated in the experiment.
4.2 Research design
This study used a 2x2 mixed factorial design, where the set of facial expres-
sions is a within-subject variable and the addition of affect bursts a between-
subjects variable. The control (visual) condition consisted of 13 participants who
only saw the facial expressions. The 15 participants in the experimental (audio-
visual) condition saw the facial expressions combined with the corresponding
affect bursts. All participants saw both the Denver facial expressions and the
default facial expressions.
4.3 Procedure
The study started with the experimenter explaining the task and the goal of the
study. If there were no further questions, Zeno would start by introducing it-
self. Next, the experiment would start and Zeno would show one emotion, which
was randomly selected from either the default facial expressions or the Denver
facial expressions. After the animation, Zeno returned to a neutral expression.
We used a forced-choice format where the participant could choose between six
emoticons, each depicting one of the six emotions, and select the emoticon they
thought best represented Zeno’s emotion. The emoticons of the popular messag-
ing app WhatsApp were used for this task, to make the choices more concrete
and interesting to children [18]. The corresponding emotion was also written be-
low each emoticon. The same process was used for the remaining emotions, until
the participant evaluated each emotion. We utilised the robot-mediated inter-
viewing method [19] and had Zeno ask the participant three questions regarding
the experiment. These questions included the participant’s opinion on the ex-
periment, which emotion he or she thought was most difficult to recognise, and
whether Zeno could improve anything. Afterwards, the experimenters debriefed
the participant.
5 Results
To calculate the main effect of the addition of affect bursts, we aggregated the
emotions for the visual and for the audio-visual condition, and ran a chi-squared
0%
25%
50%
75%
100%
Audio−Visual Visual
Condition
Percentage Correct
Emotion set
Default
Denver
Fig. 2. Recognition rates of Denver and default set of facial expressions, excluding
disgust, for both conditions. The error bars represent the 95% confidence interval.
test which indicates a significant difference (χ2(1, N = 280) = 6.16, p= .01, φ=
.15). The addition of affect bursts to the facial expressions improved the overall
recognition rate of the emotions, as can be seen in figure 2. To calculate the main
effect of the two sets of facial expressions, we aggregated the emotions from both
sets and ran a chi-squared test. The difference was not significant (χ2(1, N =
280) = 0.16, p= .69). The emotion disgust is omitted from both chi-squared
tests, because only the Denver facial expressions covered this emotion.
5.1 Visual condition
Table 1 shows the confusion matrix for the facial expressions shown in isolation.
The mean recognition rate for Zeno’s default facial expressions was 66% (SD =
29%). The emotions joy and sadness were well recognised by the participants with
recognition rates of respectively 100% and 92%. Anger was recognised correctly
by eight participants (62%), but was confused with disgust by four participants.
Fear and surprise were both recognised correctly by five participants (38%).
Seven participants confused fear with surprise, and surprise was confused with
joy six times.
For the Denver facial expressions (M= 62%, SD = 25%) both anger and
joy had high recognition rates, respectively 100% and 85%. Whereas the default
facial expression for surprise was confused with joy, the Denver facial expression
for surprise was confused with fear instead. Vice versa, fear was confused with
surprise by seven participants. Surprise and fear were correctly recognised by
respectively 54% and 38%. The recognition rate for sadness was 46%, and four
Table 1. Perception of the facial expressions in isolation (n= 13).
Response
Stimulus % correct Joy Sadness Anger Fear Surprise Disgust
Default
Joy 100% 13 -----
Sadness 92% - 12 - - - 1
Anger 62% - 1 8- - 4
Fear 38% - - - 5 71
Surprise 38% 6- - 1 5 1
Denver
Joy 100% 13 -----
Sadness 46% - 61114
Anger 85% - 1 11 - - 1
Fear 38% - - 1 5 7-
Surprise 54% 1 - - 4 71
Disgust 46% - - 7- - 6
Table 2. Perception of the facial expressions combined with affect bursts (n= 15).
Response
Stimulus % correct Joy Sadness Anger Fear Surprise Disgust
Default
Joy 100% 15 -----
Sadness 87% - 13 - - - 2
Anger 87% - - 13 - - 2
Fear 80% - - - 12 3 -
Surprise 53% 5 - - 1 81
Denver
Joy 93% 14 ----1
Sadness 73% - 11 - - 2 2
Anger 87% - - 13 1 - 1
Fear 47% - 1 - 77 -
Surprise 87% - - - 2 13 -
Disgust 80% - 2 1 - - 12
confused it with disgust. Lastly, seven participants confused disgust with anger.
The recognition rate for disgust was 46%.
5.2 Audio-visual condition
In the audio-visual condition, the facial expressions were combined with corre-
sponding affect bursts. With the exception of surprise, all default facial expres-
sions combined with affect bursts were recognised correctly 80% of the time or
more (see table 2). The mean recognition rate was 81% (SD = 17%). Surprise
was recognised correctly by eight participants (53%), and confused with joy by
five participants.
With the exception of fear, the Denver facial expressions combined with affect
bursts had high recognition rates ranging from 73% to 93%. Taken together, these
emotions had a mean recognition rate of 78% (SD = 17%). Fear was recognised
correctly by seven participants (47%), but was confused with surprise by seven
participants as well.
6 Discussion and Conclusion
In the study presented in this paper, we set out to determine whether affect
bursts can be used effectively to help constrain the meaning of Zeno’s facial
expressions. Compared to the facial expressions shown in isolation, the addition
of the affect bursts increased the recognition rates by 15% on average. This
constraining effect is well illustrated by the default facial expression for anger
and the Denver facial expression for disgust, which look very similar to each other
as can be seen in figure 1. The participants often confused these facial expressions
with either anger or disgust. However, with the addition of the affect bursts, the
participants were able to disambiguate the facial expressions.
Not all facial expressions were recognised well. The default facial expression
for surprise was not well recognised, neither with nor without the affect burst.
Surprise was often confused with joy, possibly because the facial expression also
use the corners of Zeno’s mouth to create a slight smile. Additionally, the Denver
facial expression for fear was often confused with surprise, regardless of the
addition of the affect burst. In human emotion recognition, fear and surprise
are also often confused (e.g., [20, 21]). While the affect burst for fear did help
constrain the meaning of the default facial expression of fear, it failed to do so
in combination with the Denver facial expression of fear. Salvador et al. [7] also
reported low recognition rates for the Denver facial expression of fear. However,
with the addition of an emotional gesture, they were able to greatly improve the
recognition rate of fear.
While we expected that caricatured default facial expressions of emotion
would be more easy to recognise than more humanlike Denver facial expressions,
we did not find such a difference. Nevertheless, there are differences between the
sets on specific facial expressions. Of the six emotions, only the facial expression
for joy was well recognised in both sets. As well as joy, the default facial expres-
sions for sadness was well recognised, along with the Denver facial expressions of
anger. The other facial expressions were ambiguous in their meaning and require
additional emotional information to be perceived correctly.
In light of an intervention that aims to teach autistic children how to recognise
emotions, there is also a downside to expressing emotions using two modalities.
The autistic children may rely solely on the affect bursts for recognising emotions,
and not look at Zeno’s facial expression. If this is the case, they will not learn
that a person’s face can also express emotions and how to recognise them. For
those children, additional effort is needed in the design of the intervention to
ensure that they do pay attention to Zeno’s facial expressions.
For future research, we aim to investigate whether the addition of affect
bursts also helps constrain the meaning of the facial expressions for autistic
children. While typically developing children can easily process multimodal in-
formation, it may be difficult for autistic children [22,23], which may reduce the
effect of the addition of the affect bursts found in our study. Conversely, Xavier
et al. [24] reported an improvement in the recognition of emotions when both
auditory and visual stimuli were presented.
While we found differences in recognition rates for specific facial expressions
between the default facial expressions and the Denver facial expressions, we did
not find an overall difference in recognition rate between these two sets of facial
expressions. We conclude that when Zeno’s facial expressions are presented in
isolation, the emotional meaning is not always clear, and additional information
is required to disambiguate the meaning of the facial expression. Affect bursts
can provide a developmentally appropriate manner to help constrain the meaning
of Zeno’s facial expressions, making them more easy to recognise.
Acknowledgement
We are grateful to Michelle Salvador, Sophia Silver and Mohammad Mahoor for
sharing their facial expressions for Zeno R50 with us. This work has received
funding from the European Union’s Horizon 2020 research and innovation pro-
gramme under grant agreement No. 688835 (DE-ENIGMA).
References
1. Uljarevic, M., Hamilton, A.: Recognition of Emotions in Autism: A Formal Meta-
Analysis. Journal of Autism and Developmental Disorders 43(7), 1517–1526 (2013).
doi: 10.1007/s10803-012-1695-5
2. American Psychiatric Association: Diagnostic and statistical manual of mental
disorders (5th ed.). Washington, DC: Author (2013)
3. Halberstadt, A. G., Denham, S. A., Dunsmore, J. C.: Affective Social Competence.
Social Development 10(1). 79–119 (2001). doi: 10.1111/1467-9507.00150
4. Strand, P. S., Downs, A., Barbosa-Leiker, C.: Does facial expression recognition
provide a toehold for the development of emotion understanding?. Developmental
Psychology 52(8). 1182–1191 (2016). doi: 10.1037/dev0000144
5. Diehl, J. J., Schmitt, L. M., Villano, M., Crowell, C. R.: The clinical use of robots
for individuals with Autism Spectrum Disorders: A critical review. Research in
Autism Spectrum Disorders 6(1). 249–262 (2012). doi: 10.1016/j.rasd.2011.05.006
6. Costa, S. C., Soares, F. O., Santos, C.: Facial Expressions and Gestures to Con-
vey Emotions with a Humanoid Robot. In: International Conference on Social
Robotics, pp. 542–551 (2013). doi: 10.1007/978-3-319-02675-6 54
7. Salvador, M. J., Silver, S., Mahoor, M. H.: An emotion recognition comparative
study of autistic and typically-developing children using the zeno robot. In: 2015
IEEE International Conference on Robotics and Automation (ICRA), pp. 6128–
6133 (2015). doi: 10.1109/ICRA.2015.7140059
8. Chevalier, P., Martin, J.-C., Isablue, B., Bazile, C., Tapus, A.: Impact on sensory
preferences of individuals with autism on the recognition of emotions expressed by
two robots, an avatar, and a human. Autonomous Robots 41(3). 613–635 (2016).
doi: 10.1007/s10514-016-9575-z
9. Hassin, R. R., Aviezer, H., Bentin, S.: Inherently Ambiguous: Facial Ex-
pressions of Emotions in Context. Emotion Review 5(1). 60–65 (2013). doi:
10.1177/1754073912451331
10. Barrett, L. F., Mesquita, B., Gendron, M.: Context in Emotion Percep-
tion. Current Directions in Psychological Science 20(5). 286–290 (2011). doi:
10.1177/0963721411422522
11. Hoffner, C., Badzinski, D. M.: Children’s Integration of Facial and Situational Cues
to Emotion. Child Development 60(2). 411–422 (1989). doi: 10.2307/1130986
12. Flom, R., Bahrick, L. E.: The development of infant discrimination of affect in
multimodal and unimodal stimulation: The role of intersensory redundancy. De-
velopmental Psychology 43(1), 238–252 (2007). doi: 10.1037/0012-1649.43.1.238
13. Scherer, K. R.: Vocal communication of emotion: A review of research
paradigms. Speech Communication 50(1-2). 227–256 (2003). doi: 10.1016/S0167-
6393(02)00084-5
14. Barrett, L. F., Lindquist, K. A., Gendron, M.: Language as context for the
perception of emotion. Trends in Cognitive Sciences 11(8). 327–332 (2007). doi:
10.1016/j.tics.2007.06.003
15. Schr¨oder, M.: Experimental study of affect bursts. Speech Communication 40(1-2).
531–539 (2003). doi: 10.1016/S0167-6393(02)00078-X
16. Belin, P., Fillion-Bilodeau, S., Gosselin, F.: The Montreal Affective Voices: A val-
idated set of nonverbal affect bursts for research on auditory affective processing.
Behavior Research Methods 40(2). 531–539 (2008). doi: 10.3758/BRM.40.2.531
17. Ekman, P., Friesen, W. V., Hager, J. C.: Facial action coding system (FACS): A
technique for the measurement of facial action. Palo Alto: Consulting Psychologist
Press (1978)
18. Borgers, N., de Leeuw, E., Hox, J.: Children as Respondents in Survey Research:
Cognitive Development and Response Quality. Bulletin de M´ethodologie Soci-
ologique 66(1), 60–75 (2000). doi: 10.1177/075910630006600106
19. Wood, L. J., Dautenhahn, K., Rainer, A., Robins, B., Lehmann, H., Syrdal, D. S.:
Robot-Mediated Interviews - How Effective Is a Humanoid Robot as a Tool for
Interviewing Young Children?. PLoS ONE 8(3). e59448 (2013). doi: 10.1371/jour-
nal.pone.0059448
20. Calder, A. J., Burton, A., Miller, P., Young, A. W., Akamatsu, S.: A principal
component analysis of facial expressions. Vision Research 41(9). 1179–1208 (2001).
doi: 10.1016/S0042-6989(01)00002-5
21. Castelli, F.: Understanding emotions from standardized facial expressions
in autism and normal development. Autism 9(4). 428–449 (2005). doi:
10.1177/1362361305056082
22. Happ´e F., Frith, U.: The Weak Coherence Account: Detail-focused Cognitive Style
in Autism Spectrum Disorders. Journal of Autism and Developmental Disorders
36(1). 5–25 (2006), doi: 10.1007/s10803-005-0039-0
23. Collignon, O., Charbonneau, G., Peters, F., Nassim, M., Lassonde, M., Lepore,
F., Mottron, L., Bertone, A.: Reduced multisensory facilitation in persons with
autism. Cortex 49(6). 1704–1710 (2013). doi: 10.1016/j.cortex.2012.06.001
24. Xavier, J., Vignaud, V., Ruggiero, R., Bodeau, N., Cohen, D., Chaby, L.: A Multi-
dimensional Approach to the Study of Emotion Recognition in Autism Spectrum
Disorders. Frontiers in Psychology 6. 1–9 (2015). doi: 10.3389/fpsyg.2015.01954
... which can be used to display facial expressions of emotion. It has five degrees of freedom in its face and two in its neck, which allowed us to design facial expressions of emotion for expressing joy, sadness, anger, fear, surprise, and disgust (Schadenberg et al., 2018). For the DE-ENIGMA intervention to be meaningful, learning to recognise the facial expressions of Zeno will need to generalise to humans. ...
... For instance, a child may understand pictorial version of human emotions, or photographs of human facial expressions, but the dynamic facial expressions of a person standing in front of them may still be too difficult. A robot's simplistic representation of human emotions (e.g. as in the facial expressions of Robokind's Zeno, see Schadenberg et al., 2018), may serve as the missing scaffold here. The children would first practice the targeted skill with a robot, which is simpler (e.g. ...
... To assess whether this actually led to improved recognition rates, we conducted a small study with 28 typically developing children. For them, combining the affect bursts with the facial expressions led to an increase in recognition rate of 15% on average (Schadenberg et al., 2018). In addition to the facial expressions, the robot had several gestures which could be used to (attempt to) elicit joint attention, various behaviours to draw the attention or to reward the child, and behaviour for saying "hi" and "goodbye". ...
Thesis
Full-text available
Autism Spectrum Condition (hereafter “autism”) is a lifelong neurodevelopmental condition that affects the way an individual interacts with others and experiences the world around them. Current diagnostic criteria for autism include two core features, namely (a) difficulties in social interaction and communication, and (b) the presence of rigid and repetitive patterns of behaviours and limited personal. As a result of the autism features, autistic individuals often favour more predictable environments, as they generally have difficulty dealing with change. In the context of social skill learning, experiencing discomfort due to dealing with unpredictability is problematic as it prevents children from being in a state where they are ready to learn. Incorporating a robot in social skill learning might be helpful in that it can provide a highly predictable manner of learning social skills, as we can systematically control the predictability of the robot's behaviour. Indeed, the predictability of a robot is a commonly used argument for why robots may be promising tools for autism professionals working with autistic children. The effectiveness of robot-assisted interventions designed for social skill learning presumably depends --- in part --- on the robot’s predictability. Given its importance, the concept of predictability is currently not well understood. Moreover, while early studies on robots for autistic children have found that the robot can pique the children's interest and improve engagement in interventions, designing robots to sustain long-term engagement that leads to learning is difficult. The children are very different from each other in how autism affects the development of their cognitive, language, and intellectual ability, which needs to be taken into account for the child-robot interaction. In my dissertation, I investigate how we can design robots in such a way that they can facilitate engagement. We specifically looked at how the individual differences between autistic children influence the way they interact with a robot. Another major topic in my dissertation is that of the concept of predictability. We provided a novel conceptualization and studied how a robot’s predictability influences our social perception, as well as how it influences the engagement of autistic children.
... It may make emotion recognition difficult. In one study dedicated to Zeno,Schadenberg et al. (2018) affirm:The emotions that can be modelled with Zeno's expressive face can be difficult to recognize, even by typically developing individuals. This can be partly attributed to the limited degrees of freedom of Zeno's expressive face, resulting in emotional facial expressions that may not be legible, but more importantly because facial expressions are inherently ambiguous when they are not embedded in a situational context.(Schadenberg ...
... difficult to recognize, even by typically developing individuals. This can be partly attributed to the limited degrees of freedom of Zeno's expressive face, resulting in emotional facial expressions that may not be legible, but more importantly because facial expressions are inherently ambiguous when they are not embedded in a situational context.(Schadenberg et al., 2018).The experiment described in the article consisted of adding to Zeno's facial expressions some vocal outbursts (such as "Wow!"; "No!"; "Ouch!"; etc.). As a result, the children's capacity to identify the emotions correctly increased significantly. One challenge that people with ASD face is integrating different sources of sensory 92 | Pa ...
Article
Full-text available
The traditional distinction between social robots and service robots is gradually being eroded in the design, planning and public presentation of physically embodied artificial intelligence. The paper is mainly concerned with two case studies: a service robot named Spot, from Boston Dynamics, and two social robots named Kaspar and Zeno, advertised as useful therapeutic tools for children in the autistic spectrum. The discussion centers on three key factors that play a role in the affective responses robots may elicit in the public: 1) the “uncanny valley effect,” i.e., the cognitive and affective dissonance that may be provoked by machines that are very close to reaching lifelike appearance, but fail to replicate it with respect to behavior, fine motor skills or perceptual cues. 2) the capacity to evoke affective responses thanks to the symbolic connotations that may be associated with the robot’s appearance. 3) The relevance of situational and contextual factors. The paper reaches three conclusions: 1) producers sometimes fail to consider carefully the symbolic connotations that may be associated with the design chosen for their robots. 2) The context in which robots are meant to be used ought to play a more significant role in a fair-handed and pluralistic approach to robot production. 3) it is to be hoped that producers, in collaboration with psychologists and philosophers of emotion, extend their understanding of affectivity beyond the theories of basic emotions upon which they mostly rely. It is important and urgent to reflect on the relevance of social context with respect both to emotion recognition and to the affective response of the public.
... Conversational fillers such as "umm" have also been suggested to bridge longer robot silences, thereby identifying specific moments in which robot sound could become interactionally relevant (Ohshima et al., 2015;Shiwa et al., 2008). In the work on backchannels and affect bursts, vocalizations are often combined with facial expressions Schadenberg, Heylen, & Evers, 2018). ...
... Researchers with extensive training in music design musical sounds for robots . The design and timing of backchannels and affect bursts typically aims to imitate humans (Park, Gelsomini, Lee, Zhu, et al., 2017;Schadenberg, Heylen, & Evers, 2018). In the majority of cases, the specific robot sound animations are designed in an ad hoc manner, with actionable design guidelines largely missing. ...
Book
Full-text available
Robots naturally emit sound, but we still know little about how sound can serve as an interface that makes a robot’s behavior explainable to humans. This dissertation draws on insights about human practices for coordinating bodily activities through sound, investigating how they could inform robot design. My work builds on three video corpora, involving i) a Cozmo robot in ten family homes, ii) autonomous public shuttle buses in an urban environment, and iii) a teamwork robot prototype controlled by a researcher and interacting with study participants in an experimental setting. I approached the data from two methodological angles, exploring how they can speak to each other: I first carried out an empirical analysis of the videodata from an Ethnomethodology and Conversation Analysis (EMCA) perspective, focusing on how humans make sense of robot sound on a moment-by-moment basis in naturally occurring interaction. Subsequently, taking an Interaction Design perspective, I used my video recordings as a design material for exploring how robot sound could be designed in and for real-time interaction. My work contributes toHuman-Robot Interaction through detailed studies of robots in the world (rather than in the lab), focusing on how participants make sense of robot sounds. I present anovel framework for designing sound in and for interaction and a prototyping prac-tice that allows practitioners to embed an EMCA stance into their designs. The dissertation contributes to EMCA by describing how members embed autonomous machines into the social organization of activities and how humans treat robots as participants in the interaction. I make a contribution to the development of EMCA hybrid studies by seeking a synthesis between EMCA and robot interaction design.
... Finally, in addition to these types of variance, a robot can combine modalities into one action. For example, a robot capable of facial expressions of emotion can more clearly communicate these expressions when they are combined with short, emotional non-speech expressions [125]. This can also lead to variance when different combinations, intended to express the same multimodal intent, are perceived as different robot actions. ...
... The social robot that was used is Robokind's humanoid robot R25 called "Zeno" (see Figure 1). It has five degrees of freedom in its face and two in its neck, making it capable of expressing various, recognisable, facial expressions of emotion [119,125]. The robot also showed a number of bodily gestures, such as waving, cheering, or dancing using its the seven degree's of freedom in its body. ...
Article
Full-text available
Predictability is important to autistic individuals, and robots have been suggested to meet this need as they can be programmed to be predictable, as well as elicit social interaction. The effectiveness of robot-assisted interventions designed for social skill learning presumably depends on the interplay between robot predictability, engagement in learning, and the individual differences between different autistic children. To better understand this interplay, we report on a study where 24 autistic children participated in a robot-assisted intervention. We manipulated the variance in the robot’s behaviour as a way to vary predictability, and measured the children’s behavioural engagement, visual attention, as well as their individual factors. We found that the children will continue engaging in the activity behaviourally, but may start to pay less visual attention over time to activity-relevant locations when the robot is less predictable. Instead, they increasingly start to look away from the activity. Ultimately, this could negatively influence learning, in particular for tasks with a visual component. Furthermore, severity of autistic features and expressive language ability had a significant impact on behavioural engagement. We consider our results as preliminary evidence that robot predictability is an important factor for keeping children in a state where learning can occur.
... Request permissions from permissions@acm.org. HRI '20, March 23-26, 2020 can be expressed in robots [2,5,8,30,37,42,45,46,48,50,52]. Robot emotions are typically studied by asking participants to evaluate different emotion displays after seeing them in an experimental setting, outside a real world interaction with the robot [8,42,45]. ...
... Recently, multimodal expressions of basic emotions have become popular, e.g. combinations of facial expressions and vocal sounds [48] or color, sound and movement/vibration [30,50]. ...
... Request permissions from permissions@acm.org. HRI '20, March 23-26, 2020 can be expressed in robots [2,5,8,30,37,42,45,46,48,50,52]. Robot emotions are typically studied by asking participants to evaluate different emotion displays after seeing them in an experimental setting, outside a real world interaction with the robot [8,42,45]. ...
... Recently, multimodal expressions of basic emotions have become popular, e.g. combinations of facial expressions and vocal sounds [48] or color, sound and movement/vibration [30,50]. ...
Conference Paper
This paper explores how humans interpret displays of emotion produced by a social robot in real world situated interaction. Taking a multimodal conversation analytic approach, we analyze video data of families interacting with a Cozmo robot in their homes. Focusing on one happy and one sad robot animation, we study, on a turn-by-turn basis, how participants respond to audible and visible robot behavior designed to display emotion. We show how emotion animations are consequential for interactional progressivity: While displays of happiness typically move the interaction forward, displays of sadness regularly lead to a reconsideration of previous actions by humans. Furthermore, in making sense of the robot animations people may move beyond the designer’s reported intentions, actually broadening the opportunities for their subsequent engagement. We discuss how sadness functions as an interactional "rewind button" and how the inherent vagueness of emotion displays can be deployed in design.
... In general, they showed that individuals with autism, as well as typically developing individuals, could more easily recognise emotions that were expressed through a combination of body posture and facial features. Schadenberg et al. [54] suggest that in some cases the recognition of the robot's emotions can be further improved through multimodal non-verbal affect bursts, such as laughter or sobbing-although, obviously, this can only be done when such affect bursts are appropriate to the interaction. ...
Article
Full-text available
This paper describes a longitudinal study in which children could interact unsupervised and at their own initiative with a fully autonomous computer aided learning (CAL) system situated in their classroom. The focus of this study was to investigate how the mindset of children is affected when delivering effort-related praise through a social robot. We deployed two versions: a CAL system that delivered praise through headphones only, and an otherwise identical CAL system that was extended with a social robot to deliver the praise. A total of 44 children interacted repeatedly with the CAL system in two consecutive learning tasks over the course of approximately four months. Overall, the results show that the participating children experienced a significant change in mindset. The effort-related praise that was delivered by a social robot seemed to have had a positive effect on children’s mindset, compared to the regular CAL system where we did not see a significant effect.
... Frontiers in Robotics and AI | www.frontiersin.org in Schadenberg et al. (2018). Additionally, Zeno had several gestures which could be used to (attempt to) elicit joint attention, various behaviors to draw the attention or to reward the child, and behavior for saying "hi" and "goodbye." ...
Article
Full-text available
Robots are promising tools for promoting engagement of autistic children in interventions and thereby increasing the amount of learning opportunities. However, designing deliberate robot behavior aimed at engaging autistic children remains challenging. Our current understanding of what interactions with a robot, or facilitated by a robot, are particularly motivating to autistic children is limited to qualitative reports with small sample sizes. Translating insights from these reports to design is difficult due to the large individual differences among autistic children in their needs, interests, and abilities. To address these issues, we conducted a descriptive study and report on an analysis of how 31 autistic children spontaneously interacted with a humanoid robot and an adult within the context of a robot-assisted intervention, as well as which individual characteristics were associated with the observed interactions. For this analysis, we used video recordings of autistic children engaged in a robot-assisted intervention that were recorded as part of the DE-ENIGMA database. The results showed that the autistic children frequently engaged in exploratory and functional interactions with the robot spontaneously, as well as in interactions with the adult that were elicited by the robot. In particular, we observed autistic children frequently initiating interactions aimed at making the robot do a certain action. Autistic children with stronger language ability, social functioning, and fewer autism spectrum-related symptoms, initiated more functional interactions with the robot and more robot-elicited interactions with the adult. We conclude that the children's individual characteristics, in particular the child's language ability, can be indicative of which types of interaction they are more likely to find interesting. Taking these into account for the design of deliberate robot behavior, coupled with providing more autonomy over the robot's behavior to the autistic children, appears promising for promoting engagement and facilitating more learning opportunities.
Chapter
Die Mensch-KI-Interaktion und die Mensch-Roboter-Interaktion spielen eine wachsende Rolle in der neuen Organisations- und Arbeitswelt. Weitere Bezeichnungen, die in diesem Zusammenhang von Relevanz sind und in der Vergangenheit primär genannt wurden, sind die Mensch-Maschine-Interaktion und Mensch-Computer-Interaktion. Hier werden schwerpunktmäßig die Mensch-KI-Interaktion (Human-AI-Interaction) und die Mensch-Roboter-Interaktion (Human–Robot-Interaction) betrachtet, verbunden mit der Annahme, dass der Einsatz von künstlicher Intelligenz (KI) und von Robotern in der Arbeitswelt die Aufgabenteilung und die Kommunikation zwischen Mensch und Maschine grundlegend verändert. So können lernende KI-Systeme zunehmend komplexere Aufgaben und Tätigkeiten selbstständig durchführen und Roboter arbeiten Hand in Hand mit den Menschen und werden immer mehr zum Kollegen des Menschen (Cobot). Hierzu gibt es verschiedene Thesen, Ansätze und Klassifikationen.
Article
Full-text available
The authors explored predictions from basic emotion theory (BET) that facial emotion expression recognition skills are insular with respect to their own development, and yet foundational to the development of emotional perspective-taking skills. Participants included 417 preschool children for whom estimates of these 2 emotion understanding variables and receptive language skills were obtained at 2 time points, separated by 24 weeks. Path results for autoregressive cross-lagged structural equation models revealed support for the BET predictions for younger preschoolers (ages 36 to 48 months). In contrast, results for older preschoolers (ages 49 to 67 months) revealed bidirectional influences between receptive language and emotion understanding consistent with constructionist theories of emotion. Findings support a hybrid model in which associations between receptive language and emotion understanding skills are initially nonsignificant and become significant over time. The implications of emotion expression recognition as an early toehold for the development of more advanced emotion understanding skills are discussed.
Article
Full-text available
We design a personalized human-robot environment for social learning for individuals with autism spectrum disorders (ASD). In order to define an individual’s profile, we posit that the individual’s reliance on proprioceptive and kinematic visual cues should affect the way the individual suffering from ASD interacts with a social agent (human/robot/virtual agent). In this paper, we assess the potential link between recognition performances of body/facial expressions of emotion of increasing complexity, emotion recognition on platforms with different visual features (two mini-humanoid robots, a virtual agent, and a human), and proprioceptive and visual cues integration of an individual. First, we describe the design of the EMBODI-EMO database containing videos of controlled body/facial expressions of emotions from various platforms. We explain how we validated this database with typically developed (TD) individuals. Then, we investigate the relationship between emotion recognition and proprioceptive and visual profiles of TD individuals and individuals with ASD. For TD individuals, our results indicate a relationship between profiles and emotion recognition. As expected, we show that TD individuals that rely more heavily on visual cues yield better recognition scores. However, we found that TD individuals relying on proprioception have better recognition scores, going against our hypothesis. Finally, participants with ASD relying more heavily on proprioceptive cues have lower emotion recognition scores on all conditions than participants relying on visual cues.
Article
Full-text available
Although deficits in emotion recognition have been widely reported in autism spectrum disorder (ASD), experiments have been restricted to either facial or vocal expressions. Here, we explored multimodal emotion processing in children with ASD (N = 19) and with typical development (TD, N = 19), considering uni (faces and voices) and multimodal (faces/voices simultaneously) stimuli and developmental comorbidities (neuro-visual, language and motor impairments). Compared to TD controls, children with ASD had rather high and heterogeneous emotion recognition scores but showed also several significant differences: lower emotion recognition scores for visual stimuli, for neutral emotion, and a greater number of saccades during visual task. Multivariate analyses showed that: (1) the difficulties they experienced with visual stimuli were partially alleviated with multimodal stimuli. (2) Developmental age was significantly associated with emotion recognition in TD children, whereas it was the case only for the multimodal task in children with ASD. (3) Language impairments tended to be associated with emotion recognition scores of ASD children in the auditory modality. Conversely, in the visual or bimodal (visuo-auditory) tasks, the impact of developmental coordination disorder or neuro-visual impairments was not found. We conclude that impaired emotion processing constitutes a dimension to explore in the field of ASD, as research has the potential to define more homogeneous subgroups and tailored interventions. However, it is clear that developmental age, the nature of the stimuli, and other developmental comorbidities must also be taken into account when studying this dimension.
Article
Full-text available
Les enfants comme répondants dans les enquêtes - Développement cognitif et qualité des réponses. Quoique les enfants ne sont plus une population négligée des statistiques officielles et des enquétes, des études méthodologiques sur des enquêtes d'enfants sont rares. Les chercheurs doivent se baser sur des connaissances ad hoc venant des domaines aussi divers que la psychiatrie enfantine et les tests d'éducation, ou extrapoler à partir de la connaissance méthodologique associée aux enquêtes auprès d'adultes. Dans cet article, les auteurs passent en revu la littérature scientifique disponible sur les enfants comme répondants, et présentent les résultats préliminaires d'une analyse secondaire de l'influence du développement cognitif sur la qualité des réponses. Enfin, il y a des recommandations concernants les enquêtes d'enfants.
Article
Full-text available
Robots have been used in a variety of education, therapy or entertainment contexts. This paper introduces the novel application of using humanoid robots for robot-mediated interviews. An experimental study examines how children's responses towards the humanoid robot KASPAR in an interview context differ in comparison to their interaction with a human in a similar setting. Twenty-one children aged between 7 and 9 took part in this study. Each child participated in two interviews, one with an adult and one with a humanoid robot. Measures include the behavioural coding of the children's behaviour during the interviews and questionnaire data. The questions in these interviews focused on a special event that had recently taken place in the school. The results reveal that the children interacted with KASPAR very similar to how they interacted with a human interviewer. The quantitative behaviour analysis reveal that the most notable difference between the interviews with KASPAR and the human were the duration of the interviews, the eye gaze directed towards the different interviewers, and the response time of the interviewers. These results are discussed in light of future work towards developing KASPAR as an 'interviewer' for young children in application areas where a robot may have advantages over a human interviewer, e.g. in police, social services, or healthcare applications.
Article
Full-text available
Determining the integrity of emotion recognition in autistic spectrum disorder is important to our theoretical understanding of autism and to teaching social skills. Previous studies have reported both positive and negative results. Here, we take a formal meta-analytic approach, bringing together data from 48 papers testing over 980 participants with autism. Results show there is an emotion recognition difficulty in autism, with a mean effect size of 0.80 which reduces to 0.41 when a correction for publication bias is applied. Recognition of happiness was only marginally impaired in autism, but recognition of fear was marginally worse than recognition of happiness. This meta-analysis provides an opportunity to survey the state of emotion recognition research in autism and to outline potential future directions.
Article
In this paper we present the results of our recent study on comparing the emotion expression recognition abilities of children diagnosed with high functioning Autism (ASD) with those of typically developing (TD) children through use of a humanoid robot, Zeno. In our study we investigated the effect of incorporating gestures to the emotion expression prediction accuracy of both child groups. Although the idea that ASD individuals suffer from general emotion recognition deficits is widely assumed [1], we found no significant impairment in the general emotion prediction. However, a specific deficit in correctly identifying Fear was found for children with Autism when compared to the TD children. Furthermore, we found that gestures can significantly impact the prediction accuracy of both ASD and TD children in a negative or positive manner depending on the specific expression. Thus, the use of gestures for conveying emotional expressions by a humanoid robot in a social skill therapy setting is relevant. The methodology and experimental protocol are presented and additional discussion of the Zeno R-50 robot used is given.
Article
With a few yet increasing number of exceptions, the cognitive sciences enthusiastically endorsed the idea that there are basic facial expressions of emotions that are created by specific configurations of facial muscles. We review evidence that suggests an inherent role for context in emotion perception. Context does not merely change emotion perception at the edges; it leads to radical categorical changes. The reviewed findings suggest that configurations of facial muscles are inherently ambiguous, and they call for a different approach towards the understanding of facial expressions of emotions. Prices of sticking with the modal view, and advantages of an expanded view, are succinctly reviewed.