Content uploaded by Caroline Langlet
Author content
All content in this area was uploaded by Caroline Langlet on Sep 28, 2015
Content may be subject to copyright.
Modelling user’s attitudinal reactions to the agent utterances: focus on the
verbal content
Caroline Langlet, Chlo´
e Clavel
Institut Mines-T´
el´
ecom ; T´
el´
ecom ParisTech ; CNRS LTCI
CNRS LTCI T´
el´
ecom ParisTech 46 rue Barrault F-75634 Paris Cedex 13
caroline.langlet@telecom-paristech.fr, chloe.clavel@telecom-paristech.fr
Abstract
With the view to develop a module for the detection of the user’s expressions of attitude in a human-agent interaction, the present
paper proposes to go beyond the classical positive vs. negative distinction used in sentiment analysis and provides a model of user’s
attitudes in verbal content, as defined in (Martin and White, 2005). The model considers the interaction context by modelling the link
between the user’s attitude to the previous agent’s utterance. The model is here confronted with the SEMAINE corpus. We provide
firstly an overall analysis of the annotation results in terms of labelled user and agent’s schemas and, secondly, an in-depth analysis
of the relation between the agent’s schemas and the user’s schemas. The analysis of these annotations shows that user’s attitudes and
their previous agent’s utterances have properties in common. Most of the user’s attitudes linked to agent’s utterance expressing an
attitude have the same polarity. Moreover, a quarter of the targets appraised by the user refer to a target previously appraised by the agent.
Keywords: Sentiment analysis, virtual agent, human-machine interaction
1. Introduction
One of the key scientific challenges of the research field of
embodied conversational agents (ECA) is to improve the
interaction with human users by giving to the agent the ca-
pability of integrating user’s sentiments and opinions. Most
of the proposed solutions takes into account the acoustic
features (Schuller et al., 2011) or facial expressions and the
verbal content of the user is more and more integrated, but
partially exploited, given the recent advances in sentiment
analysis.
The research field of sentiment analysis and opinion mining
proposes a bank of methods dedicated to detect opinions
and sentiments in written texts such as the ones provided
by social networks (Pang and Lee, 2008). These methods
differ by their applicative goals, the theoretical frameworks
to which they refer and the terminology used (affects, sen-
timents, feelings, opinions, evaluations). While some of
them were designed to classify texts and only focus on the
valence (positive vs. negative) of sentiments, other meth-
ods, such as (Neviarouskaya et al., 2010), aim to go beyond
and propose a fine-grained analysis of these phenomena.
These methods rely on more complex frameworks such as
the Martin and White’s model (Martin and White, 2005) –
described in Section 2 –, which provides a classification of
attitudes as they are expressed in English.
The development of a module for the detection of user’s
sentiment in a human-agent interaction requires to tackle
various scientific issues (Clavel et al., 2013): the use of
a relevant theoretical framework – as in the opinion min-
ing approaches –, the integration of interaction context, the
integration of the multimodal context for face-to-face in-
teractions and the processing time of sentiment detection.
The present paper proposes to tackle two of these issues
by providing a model of user’s attitudes in verbal content
grounded on the Martin and White’s model (Martin and
White, 2005) and dealing with the interaction context (Sec-
tion 3). The user’s attitudes are confronted with the agent’s
speech by linking it to the previous agent’s utterance. With
this model, we aim to figure out whether the agent’s speech
can trigger or constrain an expression of attitude, its target
or its polarity and to obtain first information about syntactic
and semantic features of attitude expressions.
The final aim of this work is to design a module for the
detection of user’s attitude which will use the informa-
tion given by the system about agent’s speech and will be
grounded on linguistic rules considering semantic and syn-
tactic clues.
Such a model is also especially suited for further studies on
alignment in interactions (Campano et al., 2014), where the
user’s alignment to the agent at the attitudinal level can be
investigated as a cue of engagement.
The model is here confronted with the SEMAINE corpus
(McKeown et al., 2011) (Section 4) which has been man-
ually labelled according to the annotation schema derived
from this model. The obtained annotations are thus anal-
ysed in the same section.
2. Background: theoretical frameworks of
sentiment modelling
The major part of detection systems refer to the psycho-
logical dimensional model from Osgood (Osgood et al.,
1975) by focusing on the valence axis. Other approaches,
as (Wiebe et al., 2005) or (Breck et al., 2007), refer to
the Private State Theory, which defines mental states as in-
volving opinions, beliefs, judgements, appraisals and af-
fects. Our model is grounded on Martin and White (2005),
which has already proven its worth in (Neviarouskaya et al.,
2010), (Bloom et al., 2007) and (Whitelaw C. and Arga-
mon, 2005). It provides a complex framework, for describ-
ing how attitudes are expressed in English, and go beyond
the classical positive vs. negative distinction. This model
involves three sub-systems :
•the sub-system of Attitude refers to the emotional reac-
tions and the evaluations of behaviors or things. Three
kinds of attitude are defined : the affects, which are
concerned with emotional reactions, the judgements,
which relate to evaluations toward people’s behaviours
according to normative principles, and the apprecia-
tions, which deal with evaluations toward semiotic and
natural phenomena. The authors specify that an atti-
tude has a source, the person evaluating or experienc-
ing, and a target, the entity which is evaluated or trig-
gered an affect.
•the sub-system of Engagement concerns the inter-
subjective dimension and how the speaker deals with
the potential other positions on the topic. Example
•the sub-system of Graduation describes how the de-
gree of an evaluation can be adjusted.
In order to simplify our annotation model, we gather ap-
preciations and judgements into the same main category of
“evaluation”. They share several linguistic properties and
same patterns can express both of them (Bednarek, 2009).
3. Annotation model of user’s attitudinal
expressions in interaction
The proposed annotation model aims to identify the particu-
larities of the user’s attitudinal expressions in interaction. It
integrates the verbal content of both agent’s utterances and
user’s utterances, in order to model the link between the
user’s attitudinal expressions to the agent utterances. Spe-
cific labels are defined for both the agent and the user.
Illocutionary acts of agent’s utterances In order to
model the potential influence of the agent’s speech over
the user’s expressions of attitude, we label the agent’s ut-
terances regarding the illocutionary acts that they perform.
We refer to Searle’s classification (Searle, 1976) which in-
cludes five categories:
•the representative acts : their purpose is to commit the
speaker to something’s being the case, to the truth of
the expressed proposition. Example : It’s raining ;
•the directive acts which attempt to get the hearer to do
something. Example I order you to leave ;
•the commissive acts, which commit the speaker to
some future course of action. Example I promise to
pay you the money;
•the expressive acts, which express the speaker’s psy-
chological state about a state of affairs. Example I
apologize for stepping on your toe ;
•the declaration acts, whose the successfull perfor-
mance provides that the propositional content corre-
sponds to the world. Example : You’re fired.
For labelling each agent’s utterance, we use the agent utter-
ance unit to which we add a feature specifying the type of
the illocutionary act. One agent’s speech turn can contain
several utterances performing different illocutionary acts.
For example, in the sentence “well things will normally get
better, can you eh think of something that might make you
happy”, we identify two agent utterance units: “well things
will normally get better” is a representative illocutionary
act, and “can you eh think of something that might make
you happy” is a directive illocutionary act.
Specific features of attitudes Our model considers both
the agent and the user’s expressions of attitude. An atti-
tude expression comprise three components which we need
to label : the linguistic mark referring to the attitude, the
source and the target. Information about the attitude type
and its polarity has to be also specified.
•The attitude type (affect or evaluation) and the polar-
ity (positive or negative) are specified by a feature-set
associated to the user’s schema – described below –
and the agent utterance unit. It should be specified
that when the agent does not refer to an attitude these
features receive the none value.
•When the user and the agent’s expressions of attitude
have a target and a source expressed, we use the tar-
get unit and the source unit. The target unit deals
with the phrase referring to the entity, the process or
the behaviour evaluated or trigger the attitude, and the
source unit has to do with the phrase referring to the
source of the evaluation or the emoter of the affect. In
order to check the influence of the agent’s speech over
the user’s attitudes, the user’s target was relied to the
agent’s target when it was referring to the same entity
or one of its sub-aspects.
•The attitude mark is only labelled for the user’s atti-
tude and not for the agent’s one. Regarding the agent,
the feature specifying the attitude type is enough to
specify if his utterance conveys an attitude. However,
since our further detection module will have to focus
on the user’s expressions of attitude, we need to re-
trieve information about its linguistic mark. The at-
titude mark unit concerns, at the phrase level, both
linguistic marks referring to an attitude and modi-
fiers which can shift, intensify or diminish its semantic
value or its valence. For example, in a sentence as “I
dont really like my work”, “dont really like” is tagged
as an attitude mark.
Linking the user’s attitude to the agent’s previous speech
turn Finally, the agent utterance units and the source and
target units, to which it may be linked, compose an agent
schema. Similarly, the attitude mark unit and the source
and target units, to which it may be linked, compose a user
schema. Each user schema is linked to the previous agent
schema by a simple relation notifying this precedence.
Topic segmentation A same conversation can comprise
different topics. Here, we define the topic as a sequence,
which takes place across several agent and user speech
turns. A main topic can include sub-topics, which can re-
fer to its sub-aspects. For example, in a conversation where
speakers talk about “christmas” and the gifts they received,
“christmas” is tagged as the topic, and “gift” as the sub-
topic. It may be important to check whether the user’s at-
titudes are linked to the ongoing topic of the conversation.
Figure 1: Annotation of two specific utterances. The noun “activities” here opens a new topic
For this purpose, two units are dedicated to the topic la-
belling, the topic unit and the sub-topic unit. Even if the
topic is sequential and crosses several speech turns, we la-
bel the first occurrence of the typical word of the topic
or sub-topic. A dedicated relation links a sub-topic to its
main topic. A feature was added to both topic and sub-
topic units, which specify if the topic or sub-topic has been
started by the user or the agent. When a target was referring
to a topic or a sub-topic, we link them with specific relation.
As a summary, Figure 1 presents how two specific sen-
tences are labelled by referring to our model. Since the
agent’s utterance have an interrogative form, it is labelled as
a directive illocutionary act. As explained by Searle (1976),
questions are species of directives since they are attempts
by the speaker to get the the hearer to answer, i. e. to per-
form a speech act. The propositional content of this ques-
tion concerns an evaluation expressed by “like”. The source
of the evaluation does not match with the speaker – i. e. the
agent – but with the hearer – i. e. the user. The agent asks
to the user to express himself about an positive evaluation
regarding to a target already chosen, “outdoor activities”.
Regarding the user’s sentence, “don’t like” is labelled as an
attitude mark – and “i” as is source. The target of the user’s
attitude“being indoors” is linked to the target of the agent’s
attitude: even if they are kind of antonyms, an ontological
reference exists between them.
4. Labelling user’s attitude in a human-agent
(operator) interaction: the SEMAINE
Corpus
The model is here confronted with the SEMAINE corpus
(McKeown et al., 2011). As a preliminary study, one anno-
tator labelled the corpus, but we plan to provide a second
annotation with other annotator. This corpus comprises 65
manually-transcribed sessions where a human user inter-
acts with a human operator acting the role of a virtual agent.
These interactions are based on a scenario involving four
agent characters : Poppy, happy and outgoing, Prudence,
sensible and level-headed, Spike, angry and confrontational
and Obadiah, depressive and gloomy. Agent’s utterances
are constrained by a script (however, some deviations to
the script occur in the database) with the aim to push the
user toward the character played’s state. 15 sessions were
labelled according to the previously described annotation
model. Sixteen sections, four sessions for each characters,
were labelled. Thirteen different users are involved in these
different sessions. Regarding the agents, there are four dif-
ferent actors playing the role of Poppy, three for Prudence,
three for Obadiah and three for Spike. Finally, for all ses-
sions, there are five different actors.
As an annotation tool, we use the Glozz Plateform
(Widl¨
ocher and Mathet, 2012). By using Glozz, we can lo-
cate, identify and describe linguistic phenomena in textual
documents.
Overall analysis The sessions labelled have got variable
number of speech turns (132 for the longer session, 38 for
the smaller). In the entire corpus, the users and the agents
have got a number of speech turns almost similar: 559 for
the users and 579 the agent. Over the 450 labelled agent
schemas (0.77 per speech turn), 46% express a directive act,
42% an expressive act and 12% representative. No declara-
tion and commissive speech act was founded. This is prob-
ably due to the nature of the scenario on which is grounded
the Semaine corpus: a narrative conversation where the user
is pushed to talk about this life. As shown in Figure 2, the
distribution of illocutionary acts is the same regarding the
agent’s identity with the exception of Obadiah. The Oba-
diah sessions show that the expressive acts hold a majority
and the expressive acts occur more often than in the other
agents sessions.
As explained above, the operator’s utterances can express
attitudes. In our corpus, 339 operator’s schemas contain
expressions of attitude: 187 affects and 152 evaluations.
With regard to the agent’s identity, the expressions of af-
fect are more numerous than the evaluation ones, excepting
for the Prudence sessions (see Figure 3). This is probably
due to the Prudence’s personnality that the operator have
to play : a sensible and level-headed person who expresses
evaluations about the user’s behaviour and asks the user to
express attitudes about specific things.
The annotation model allows us to give an insight also
into the user’s expressions of attitude (238 labelled user’s
Figure 2: Distribution of illocutionary acts according to the
agent’s identity
Figure 3: Distribution of affects and evaluations according
to the agent’s identity
schemas). In particular, we show that expressions of evalu-
ation are more frequent than expressions of affect, but it is
not a clear majority : 44% are affects and 56% evaluations.
Illocutionary acts and user’s attitude Both the direc-
tive and expressive acts have a significant number of occur-
rences in the entire labelled corpus. Nevertheless, regard-
ing the specific relation linking the user’s attitudes to the
agent’s utterances, the directive illocutionary acts prevail :
57% of user’s attitudes are linked with an agent’s directive
act. It seems that most of the directive acts labelled in the
corpus have an interrogating form : the agent asks the user
to tell about something. Thus, as requests, these agent’s ut-
terances may easily involve an attitudinal reaction from the
user. Moreover, some of them are explicit requests, from
agent to the user, to express himself about attitudes.
Polarity accordance The user and the agent attitudes are
studied according to the type of agent played by the op-
erator (see Table 1). As expected, Poppy and Prudence
sessions express more positive attitudes, whereas Spike es-
sentially expresses attitudes with a negative polarity. The
distribution is more balanced concerning Spike. Moreover,
with regard to the relation between user schemas and agent
schemas, the polarity of user’s attitudes is mostly the same
as the polarity expressed by the agent. The polarity of 71%
of the user schema linked to an agent schema containing
an attitude matches with the polarity of the agent’s attitude.
Furthermore, this polarity accordance occurs in most of the
sessions (see Figure 4).
Figure 4: For each session, number of user’s attitudes,
linked to an agent’s attitude, which have the same polarity
Sessions Agent
positive
attitudes
Agent
negative
attitudes
User
positive
attitudes
User
negative
attitudes
Poppy
sessions
92% 8% 83% 17%
Prudence
sessions
96% 4% 67% 33%
Spike
sessions
20% 80% 40% 60%
Obadiah
sessions
41% 59% 52% 48%
Table 1: Polarity of user and agent’s attitudes according the
agent identity
Target Among the attitudes labelled by the user’s
schemas (238 in the entire corpus), 172 have a target. When
the user schema is linked to an agent schema containing
an attitude, the user target can refer to the agent target :
one relation in our model notifies its reference. In the en-
tire labelled corpus, a quarter of the target appraised by the
user refer to a target appraised by the agent (27% of them).
These results show that the user does not always choose
what object he will appraise. This phenomena needs to be
considered in order to design a sentiment analysis module :
with regard to the agent target - which will be known by the
system - it will may be easier to process the potential user’s
attitudinal reaction and find its target.
Topic In the entire labelled corpus, 61 lexical units are la-
belled as topic and 51 as sub-topic – an average number of
4 topics and 3.4 sub-topics by session. Among these top-
ics and sub-topics, 57% are started by the agent, 30% by
the user, and 13% of them arise due to a kind of collabo-
ration between the agent and the user. For example, in the
session 26, Poppy asks to the user, “where is the best wake
up you ever had?”. The user answers “in a tent in kiliman-
jaro”. Here, the agent’s question opens a new topic but not
completely defines it. It is the user’s answer which chooses
the new topic, but by following the indications given by the
agent’s question. When the user’s targets are not linked to
an agent’s target, they may be linked to a topic or a sub-
topic. In the entire labelled corpus, out of a total of 172,
28 user’s targets are linked to a topic and 47 to a sub-topic.
32% of these topics and sub-topics are started by an agent,
40% by a user and 28% result form the collaboration - de-
scribed above - between the agent and the user. Thus, as for
the targets, the user’s expressions of attitude are grounded
on elements which are arised in the conversation through
the agent’s speech.
5. Discussion and tracks to design a module
for detection of user’s attitudes
As shown by the framework of our model and its different
units and features, we aim to describe the expressions of at-
titude in a compositional way, i. e. we consider their mean-
ings as built by the sum of the meaning of their constituents.
The categories of our model provide the first semantic val-
ues to describe this meaning and that of the agent’s utter-
ances. This compositional representation will be useful to
build our attitudes detection module.
First, the semantic values regarding the agent’s utterances
(illocutionary acts, attitudes, etc.) will allow us to process
the user’s ones. Since there is some semantic accordance
between the agent’s utterances and user’s attitudes, a se-
mantic characterisation of each agent’s utterance could be
used to anticipate the possible following user’s expressions
of attitude and to improve their analysis when they occur.
This characterisation could be implemented as a simplified
semantic feature set and used as an input of the module. For
instance, the feature set associated to the agent’s sentence
“Do you like outdoor activities” (see Figure 1) indicates
that the agent’s utterance conveys the user to express an at-
titude. The user’s expected attitude can thus be modelled
in the module, which can check whether its source and its
target are the same as the ones in the agent’s utterance by
using linguistic rules grounded on syntactic and semantic
clues. If no accordance is founded, a more complex analy-
sis can be done.
Second, some refinements which will help the future de-
tection module can be done in our model. Regarding the
agent’s utterances, as shown in Section 4, the expressive
and the directive illocutionary acts have a large number of
occurrences in the corpus. Thus, these categories could be
refined. For example, two sub-categories could be linked
to the directive illocutionary act category: question and
suggestion. Such sub-categories could give more accu-
rate information about the meaning of the agent’s utterance,
which will be helpful to improve the performance of our de-
tection module. For instance, if the feature set introduced
above could specify that the sentence has an interrogative
form, this can limit the number of likely user’s sentences
to consider and allows the module to use a more specific
linguistic rule to process it. Regarding the user’s attitudes,
other features could be added too. For example, with regard
to the graduation dimension, we need to distinguish differ-
ent semantic values among the valence modifiers or shifters
: a negation will not have to be analysed in the same way as
an intensifier. Moreover, it is important to provide – as an
output – information about how graduating is the attitude
expressed. Indeed, it could be interesting that the agent
do not perform in the same way attitudes like “I like out-
door activities” and “I like very much outdoor activities”.
Finally, the model needs to also deal with multimodality
issue: in order to ensure this, the user’s attitudes could be
also linked to the agent’s non-verbal signal.
6. Conclusion and further work
This paper proposes a model of user attitudes in verbal con-
tent. In order to go beyond the classical positive vs. neg-
ative distinction, this model examines some features as the
source, the target or the attitude type and deals with interac-
tion by integrating information about the illocutionary acts
and the relations between user and agent units. This model
– confronted with the SEMAINE corpus – shows that these
features are relevant. The user’s attitudes have properties
in common with the agent’s attitudes, like the polarity and,
less, the target. In further work, as explained in Section
5, the simplified semantic representation provided by the
model will be refined. By doing so, our model will be a
strong foundation for designing our detection model.
7. Acknowledgment
The authors thank Catherine Pelachaud for valuable in-
sights and suggestions. This work has been supported
by the european collaborative project TARDIS and the
SMART Labex project1.
8. References
Bednarek, M. (2009). Language patterns and attitudes.
Functions of Language, pages 165–192, 16/2.
Bloom, K., Garg, N., and S., A. (2007). Extracting ap-
praisal expressions. HLT-NAACL, pages 165–192, April.
Breck, E., Choi, Y., and Cardie, C. (2007). Identifying
expressions of opinion in context. In Sangal S., M. H.
and K., B. R., editors, International Joint Conference On
Artifical Intelligence, pages 2683–2688, San Francisco,
CA. Morgan KoffMann Publishers.
Campano, S., Durand, J., and Clavel, C. (2014). Compar-
ative analysis of verbal alignment in human-human and
human-agent interactions. In Language Resources and
Evaluation Conference. to appear, May.
1http://www.smart-labex.fr/
Clavel, C., Pelachaud, C., and Ochs, M. (2013). User’s
sentiment analysis in face-to-face human-agent interac-
tions prospects. In Workshop on Affective Social Sig-
nal Computing, Satellite of Interspeech. Association for
Computational Linguistics, August.
Martin, J. R. and White, P. R. (2005). The Language
of Evaluation. Appraisal in English. Macmillan Bas-
ingstoke, London and New York.
McKeown, G., Valstar, M., Cowie, R., Pantic, M., and
Schroder, M. (2011). The semaine database: Annotated
multimodal records of emotionally colored conversations
between a person and a limited agent. IEEE Transac-
tions on Affective Computing, 3(1):5–17, Jan-March.
Neviarouskaya, A., Prendinger, H., and Ishizuka, M.
(2010). Recognition of affect, judgment, and appreci-
ation in text. In Proceedings of the 23rd International
Conference on Computational Linguistics, COLING ’10,
pages 806–814, Stroudsburg, PA, USA. Association for
Computational Linguistics.
Osgood, C., Mai, W. H., and S., M. M. (1975). Cross-
cultural Universals of Affective Meaning. University of
Illinois Press, Urbana.
Pang, B. and Lee, L. (2008). Opinion mining and senti-
ment analysis. Foundations and Trends in Information
Retrieval, 2(1-2):1–135, January.
Schuller, B., Batliner, A., Steidl, S., and Seppi, D. (2011).
Recognising realistic emotions and affect in speech:
State of the art and lessons learnt from the first challenge.
Speech Communication, 53(9-10):1062–1087, Novem-
ber.
Searle, J. R. (1976). A classification of illocutionary acts.
Language in society, 5(01):1–23.
Whitelaw C., Garg, N. and Argamon, S. (2005). Using ap-
praisal taxonomies for sentiment analysis. Proceedings
of CIKM-05, the ACM SIGIR Conference on Information
and Knowledge Management, April.
Widl¨
ocher, A. and Mathet, Y. (2012). The glozz platform:
A corpus annotation and mining tool. In Proceedings of
the 2012 ACM Symposium on Document Engineering,
pages 171–180, Paris, France, September.
Wiebe, J., Wilson, T., and Cardie, C. (2005). Annotation
expressions of opinion and emotions in language. Lan-
guage Resources and Evaluation, pages 165–210, Vol.
39/2-3.