Conference PaperPDF Available

Modelling user's attitudinal reactions to the agent utterances: focus on the verbal content


Abstract and Figures

With the view to develop a module for the detection of the user's expressions of attitude in a human-agent interaction, the present paper proposes to go beyond the classical positive vs. negative distinction used in sentiment analysis and provides a model of user's attitudes in verbal content, as defined in (Martin and White, 2005). The model considers the interaction context by modelling the link between the user's attitude to the previous agent's utterance. The model is here confronted with the SEMAINE corpus. We provide firstly an overall analysis of the annotation results in terms of labelled user and agent's schemas and, secondly, an in-depth analysis of the relation between the agent's schemas and the user's schemas. The analysis of these annotations shows that user's attitudes and their previous agent's utterances have properties in common. Most of the user's attitudes linked to agent's utterance expressing an attitude have the same polarity. Moreover, a quarter of the targets appraised by the user refer to a target previously appraised by the agent.
Content may be subject to copyright.
Modelling user’s attitudinal reactions to the agent utterances: focus on the
verbal content
Caroline Langlet, Chlo´
e Clavel
Institut Mines-T´
ecom ; T´
ecom ParisTech ; CNRS LTCI
ecom ParisTech 46 rue Barrault F-75634 Paris Cedex 13,
With the view to develop a module for the detection of the user’s expressions of attitude in a human-agent interaction, the present
paper proposes to go beyond the classical positive vs. negative distinction used in sentiment analysis and provides a model of user’s
attitudes in verbal content, as defined in (Martin and White, 2005). The model considers the interaction context by modelling the link
between the user’s attitude to the previous agent’s utterance. The model is here confronted with the SEMAINE corpus. We provide
firstly an overall analysis of the annotation results in terms of labelled user and agent’s schemas and, secondly, an in-depth analysis
of the relation between the agent’s schemas and the user’s schemas. The analysis of these annotations shows that user’s attitudes and
their previous agent’s utterances have properties in common. Most of the user’s attitudes linked to agent’s utterance expressing an
attitude have the same polarity. Moreover, a quarter of the targets appraised by the user refer to a target previously appraised by the agent.
Keywords: Sentiment analysis, virtual agent, human-machine interaction
1. Introduction
One of the key scientific challenges of the research field of
embodied conversational agents (ECA) is to improve the
interaction with human users by giving to the agent the ca-
pability of integrating user’s sentiments and opinions. Most
of the proposed solutions takes into account the acoustic
features (Schuller et al., 2011) or facial expressions and the
verbal content of the user is more and more integrated, but
partially exploited, given the recent advances in sentiment
The research field of sentiment analysis and opinion mining
proposes a bank of methods dedicated to detect opinions
and sentiments in written texts such as the ones provided
by social networks (Pang and Lee, 2008). These methods
differ by their applicative goals, the theoretical frameworks
to which they refer and the terminology used (affects, sen-
timents, feelings, opinions, evaluations). While some of
them were designed to classify texts and only focus on the
valence (positive vs. negative) of sentiments, other meth-
ods, such as (Neviarouskaya et al., 2010), aim to go beyond
and propose a fine-grained analysis of these phenomena.
These methods rely on more complex frameworks such as
the Martin and White’s model (Martin and White, 2005) –
described in Section 2 –, which provides a classification of
attitudes as they are expressed in English.
The development of a module for the detection of user’s
sentiment in a human-agent interaction requires to tackle
various scientific issues (Clavel et al., 2013): the use of
a relevant theoretical framework – as in the opinion min-
ing approaches –, the integration of interaction context, the
integration of the multimodal context for face-to-face in-
teractions and the processing time of sentiment detection.
The present paper proposes to tackle two of these issues
by providing a model of user’s attitudes in verbal content
grounded on the Martin and White’s model (Martin and
White, 2005) and dealing with the interaction context (Sec-
tion 3). The user’s attitudes are confronted with the agent’s
speech by linking it to the previous agent’s utterance. With
this model, we aim to figure out whether the agent’s speech
can trigger or constrain an expression of attitude, its target
or its polarity and to obtain first information about syntactic
and semantic features of attitude expressions.
The final aim of this work is to design a module for the
detection of user’s attitude which will use the informa-
tion given by the system about agent’s speech and will be
grounded on linguistic rules considering semantic and syn-
tactic clues.
Such a model is also especially suited for further studies on
alignment in interactions (Campano et al., 2014), where the
user’s alignment to the agent at the attitudinal level can be
investigated as a cue of engagement.
The model is here confronted with the SEMAINE corpus
(McKeown et al., 2011) (Section 4) which has been man-
ually labelled according to the annotation schema derived
from this model. The obtained annotations are thus anal-
ysed in the same section.
2. Background: theoretical frameworks of
sentiment modelling
The major part of detection systems refer to the psycho-
logical dimensional model from Osgood (Osgood et al.,
1975) by focusing on the valence axis. Other approaches,
as (Wiebe et al., 2005) or (Breck et al., 2007), refer to
the Private State Theory, which defines mental states as in-
volving opinions, beliefs, judgements, appraisals and af-
fects. Our model is grounded on Martin and White (2005),
which has already proven its worth in (Neviarouskaya et al.,
2010), (Bloom et al., 2007) and (Whitelaw C. and Arga-
mon, 2005). It provides a complex framework, for describ-
ing how attitudes are expressed in English, and go beyond
the classical positive vs. negative distinction. This model
involves three sub-systems :
the sub-system of Attitude refers to the emotional reac-
tions and the evaluations of behaviors or things. Three
kinds of attitude are defined : the affects, which are
concerned with emotional reactions, the judgements,
which relate to evaluations toward people’s behaviours
according to normative principles, and the apprecia-
tions, which deal with evaluations toward semiotic and
natural phenomena. The authors specify that an atti-
tude has a source, the person evaluating or experienc-
ing, and a target, the entity which is evaluated or trig-
gered an affect.
the sub-system of Engagement concerns the inter-
subjective dimension and how the speaker deals with
the potential other positions on the topic. Example
the sub-system of Graduation describes how the de-
gree of an evaluation can be adjusted.
In order to simplify our annotation model, we gather ap-
preciations and judgements into the same main category of
“evaluation”. They share several linguistic properties and
same patterns can express both of them (Bednarek, 2009).
3. Annotation model of user’s attitudinal
expressions in interaction
The proposed annotation model aims to identify the particu-
larities of the user’s attitudinal expressions in interaction. It
integrates the verbal content of both agent’s utterances and
user’s utterances, in order to model the link between the
user’s attitudinal expressions to the agent utterances. Spe-
cific labels are defined for both the agent and the user.
Illocutionary acts of agent’s utterances In order to
model the potential influence of the agent’s speech over
the user’s expressions of attitude, we label the agent’s ut-
terances regarding the illocutionary acts that they perform.
We refer to Searle’s classification (Searle, 1976) which in-
cludes five categories:
the representative acts : their purpose is to commit the
speaker to something’s being the case, to the truth of
the expressed proposition. Example : It’s raining ;
the directive acts which attempt to get the hearer to do
something. Example I order you to leave ;
the commissive acts, which commit the speaker to
some future course of action. Example I promise to
pay you the money;
the expressive acts, which express the speaker’s psy-
chological state about a state of affairs. Example I
apologize for stepping on your toe ;
the declaration acts, whose the successfull perfor-
mance provides that the propositional content corre-
sponds to the world. Example : You’re fired.
For labelling each agent’s utterance, we use the agent utter-
ance unit to which we add a feature specifying the type of
the illocutionary act. One agent’s speech turn can contain
several utterances performing different illocutionary acts.
For example, in the sentence “well things will normally get
better, can you eh think of something that might make you
happy”, we identify two agent utterance units: “well things
will normally get better” is a representative illocutionary
act, and “can you eh think of something that might make
you happy” is a directive illocutionary act.
Specific features of attitudes Our model considers both
the agent and the user’s expressions of attitude. An atti-
tude expression comprise three components which we need
to label : the linguistic mark referring to the attitude, the
source and the target. Information about the attitude type
and its polarity has to be also specified.
The attitude type (affect or evaluation) and the polar-
ity (positive or negative) are specified by a feature-set
associated to the user’s schema – described below –
and the agent utterance unit. It should be specified
that when the agent does not refer to an attitude these
features receive the none value.
When the user and the agent’s expressions of attitude
have a target and a source expressed, we use the tar-
get unit and the source unit. The target unit deals
with the phrase referring to the entity, the process or
the behaviour evaluated or trigger the attitude, and the
source unit has to do with the phrase referring to the
source of the evaluation or the emoter of the affect. In
order to check the influence of the agent’s speech over
the user’s attitudes, the user’s target was relied to the
agent’s target when it was referring to the same entity
or one of its sub-aspects.
The attitude mark is only labelled for the user’s atti-
tude and not for the agent’s one. Regarding the agent,
the feature specifying the attitude type is enough to
specify if his utterance conveys an attitude. However,
since our further detection module will have to focus
on the user’s expressions of attitude, we need to re-
trieve information about its linguistic mark. The at-
titude mark unit concerns, at the phrase level, both
linguistic marks referring to an attitude and modi-
fiers which can shift, intensify or diminish its semantic
value or its valence. For example, in a sentence as “I
dont really like my work”, “dont really like” is tagged
as an attitude mark.
Linking the user’s attitude to the agent’s previous speech
turn Finally, the agent utterance units and the source and
target units, to which it may be linked, compose an agent
schema. Similarly, the attitude mark unit and the source
and target units, to which it may be linked, compose a user
schema. Each user schema is linked to the previous agent
schema by a simple relation notifying this precedence.
Topic segmentation A same conversation can comprise
different topics. Here, we define the topic as a sequence,
which takes place across several agent and user speech
turns. A main topic can include sub-topics, which can re-
fer to its sub-aspects. For example, in a conversation where
speakers talk about “christmas” and the gifts they received,
“christmas” is tagged as the topic, and “gift” as the sub-
topic. It may be important to check whether the user’s at-
titudes are linked to the ongoing topic of the conversation.
Figure 1: Annotation of two specific utterances. The noun “activities” here opens a new topic
For this purpose, two units are dedicated to the topic la-
belling, the topic unit and the sub-topic unit. Even if the
topic is sequential and crosses several speech turns, we la-
bel the first occurrence of the typical word of the topic
or sub-topic. A dedicated relation links a sub-topic to its
main topic. A feature was added to both topic and sub-
topic units, which specify if the topic or sub-topic has been
started by the user or the agent. When a target was referring
to a topic or a sub-topic, we link them with specific relation.
As a summary, Figure 1 presents how two specific sen-
tences are labelled by referring to our model. Since the
agent’s utterance have an interrogative form, it is labelled as
a directive illocutionary act. As explained by Searle (1976),
questions are species of directives since they are attempts
by the speaker to get the the hearer to answer, i. e. to per-
form a speech act. The propositional content of this ques-
tion concerns an evaluation expressed by “like”. The source
of the evaluation does not match with the speaker – i. e. the
agent – but with the hearer – i. e. the user. The agent asks
to the user to express himself about an positive evaluation
regarding to a target already chosen, “outdoor activities”.
Regarding the user’s sentence, “don’t like” is labelled as an
attitude mark – and “i” as is source. The target of the user’s
attitude“being indoors” is linked to the target of the agent’s
attitude: even if they are kind of antonyms, an ontological
reference exists between them.
4. Labelling user’s attitude in a human-agent
(operator) interaction: the SEMAINE
The model is here confronted with the SEMAINE corpus
(McKeown et al., 2011). As a preliminary study, one anno-
tator labelled the corpus, but we plan to provide a second
annotation with other annotator. This corpus comprises 65
manually-transcribed sessions where a human user inter-
acts with a human operator acting the role of a virtual agent.
These interactions are based on a scenario involving four
agent characters : Poppy, happy and outgoing, Prudence,
sensible and level-headed, Spike, angry and confrontational
and Obadiah, depressive and gloomy. Agent’s utterances
are constrained by a script (however, some deviations to
the script occur in the database) with the aim to push the
user toward the character played’s state. 15 sessions were
labelled according to the previously described annotation
model. Sixteen sections, four sessions for each characters,
were labelled. Thirteen different users are involved in these
different sessions. Regarding the agents, there are four dif-
ferent actors playing the role of Poppy, three for Prudence,
three for Obadiah and three for Spike. Finally, for all ses-
sions, there are five different actors.
As an annotation tool, we use the Glozz Plateform
ocher and Mathet, 2012). By using Glozz, we can lo-
cate, identify and describe linguistic phenomena in textual
Overall analysis The sessions labelled have got variable
number of speech turns (132 for the longer session, 38 for
the smaller). In the entire corpus, the users and the agents
have got a number of speech turns almost similar: 559 for
the users and 579 the agent. Over the 450 labelled agent
schemas (0.77 per speech turn), 46% express a directive act,
42% an expressive act and 12% representative. No declara-
tion and commissive speech act was founded. This is prob-
ably due to the nature of the scenario on which is grounded
the Semaine corpus: a narrative conversation where the user
is pushed to talk about this life. As shown in Figure 2, the
distribution of illocutionary acts is the same regarding the
agent’s identity with the exception of Obadiah. The Oba-
diah sessions show that the expressive acts hold a majority
and the expressive acts occur more often than in the other
agents sessions.
As explained above, the operator’s utterances can express
attitudes. In our corpus, 339 operator’s schemas contain
expressions of attitude: 187 affects and 152 evaluations.
With regard to the agent’s identity, the expressions of af-
fect are more numerous than the evaluation ones, excepting
for the Prudence sessions (see Figure 3). This is probably
due to the Prudence’s personnality that the operator have
to play : a sensible and level-headed person who expresses
evaluations about the user’s behaviour and asks the user to
express attitudes about specific things.
The annotation model allows us to give an insight also
into the user’s expressions of attitude (238 labelled user’s
Figure 2: Distribution of illocutionary acts according to the
agent’s identity
Figure 3: Distribution of affects and evaluations according
to the agent’s identity
schemas). In particular, we show that expressions of evalu-
ation are more frequent than expressions of affect, but it is
not a clear majority : 44% are affects and 56% evaluations.
Illocutionary acts and user’s attitude Both the direc-
tive and expressive acts have a significant number of occur-
rences in the entire labelled corpus. Nevertheless, regard-
ing the specific relation linking the user’s attitudes to the
agent’s utterances, the directive illocutionary acts prevail :
57% of user’s attitudes are linked with an agent’s directive
act. It seems that most of the directive acts labelled in the
corpus have an interrogating form : the agent asks the user
to tell about something. Thus, as requests, these agent’s ut-
terances may easily involve an attitudinal reaction from the
user. Moreover, some of them are explicit requests, from
agent to the user, to express himself about attitudes.
Polarity accordance The user and the agent attitudes are
studied according to the type of agent played by the op-
erator (see Table 1). As expected, Poppy and Prudence
sessions express more positive attitudes, whereas Spike es-
sentially expresses attitudes with a negative polarity. The
distribution is more balanced concerning Spike. Moreover,
with regard to the relation between user schemas and agent
schemas, the polarity of user’s attitudes is mostly the same
as the polarity expressed by the agent. The polarity of 71%
of the user schema linked to an agent schema containing
an attitude matches with the polarity of the agent’s attitude.
Furthermore, this polarity accordance occurs in most of the
sessions (see Figure 4).
Figure 4: For each session, number of user’s attitudes,
linked to an agent’s attitude, which have the same polarity
Sessions Agent
92% 8% 83% 17%
96% 4% 67% 33%
20% 80% 40% 60%
41% 59% 52% 48%
Table 1: Polarity of user and agent’s attitudes according the
agent identity
Target Among the attitudes labelled by the user’s
schemas (238 in the entire corpus), 172 have a target. When
the user schema is linked to an agent schema containing
an attitude, the user target can refer to the agent target :
one relation in our model notifies its reference. In the en-
tire labelled corpus, a quarter of the target appraised by the
user refer to a target appraised by the agent (27% of them).
These results show that the user does not always choose
what object he will appraise. This phenomena needs to be
considered in order to design a sentiment analysis module :
with regard to the agent target - which will be known by the
system - it will may be easier to process the potential user’s
attitudinal reaction and find its target.
Topic In the entire labelled corpus, 61 lexical units are la-
belled as topic and 51 as sub-topic – an average number of
4 topics and 3.4 sub-topics by session. Among these top-
ics and sub-topics, 57% are started by the agent, 30% by
the user, and 13% of them arise due to a kind of collabo-
ration between the agent and the user. For example, in the
session 26, Poppy asks to the user, “where is the best wake
up you ever had?”. The user answers “in a tent in kiliman-
jaro”. Here, the agent’s question opens a new topic but not
completely defines it. It is the user’s answer which chooses
the new topic, but by following the indications given by the
agent’s question. When the user’s targets are not linked to
an agent’s target, they may be linked to a topic or a sub-
topic. In the entire labelled corpus, out of a total of 172,
28 user’s targets are linked to a topic and 47 to a sub-topic.
32% of these topics and sub-topics are started by an agent,
40% by a user and 28% result form the collaboration - de-
scribed above - between the agent and the user. Thus, as for
the targets, the user’s expressions of attitude are grounded
on elements which are arised in the conversation through
the agent’s speech.
5. Discussion and tracks to design a module
for detection of user’s attitudes
As shown by the framework of our model and its different
units and features, we aim to describe the expressions of at-
titude in a compositional way, i. e. we consider their mean-
ings as built by the sum of the meaning of their constituents.
The categories of our model provide the first semantic val-
ues to describe this meaning and that of the agent’s utter-
ances. This compositional representation will be useful to
build our attitudes detection module.
First, the semantic values regarding the agent’s utterances
(illocutionary acts, attitudes, etc.) will allow us to process
the user’s ones. Since there is some semantic accordance
between the agent’s utterances and user’s attitudes, a se-
mantic characterisation of each agent’s utterance could be
used to anticipate the possible following user’s expressions
of attitude and to improve their analysis when they occur.
This characterisation could be implemented as a simplified
semantic feature set and used as an input of the module. For
instance, the feature set associated to the agent’s sentence
“Do you like outdoor activities” (see Figure 1) indicates
that the agent’s utterance conveys the user to express an at-
titude. The user’s expected attitude can thus be modelled
in the module, which can check whether its source and its
target are the same as the ones in the agent’s utterance by
using linguistic rules grounded on syntactic and semantic
clues. If no accordance is founded, a more complex analy-
sis can be done.
Second, some refinements which will help the future de-
tection module can be done in our model. Regarding the
agent’s utterances, as shown in Section 4, the expressive
and the directive illocutionary acts have a large number of
occurrences in the corpus. Thus, these categories could be
refined. For example, two sub-categories could be linked
to the directive illocutionary act category: question and
suggestion. Such sub-categories could give more accu-
rate information about the meaning of the agent’s utterance,
which will be helpful to improve the performance of our de-
tection module. For instance, if the feature set introduced
above could specify that the sentence has an interrogative
form, this can limit the number of likely user’s sentences
to consider and allows the module to use a more specific
linguistic rule to process it. Regarding the user’s attitudes,
other features could be added too. For example, with regard
to the graduation dimension, we need to distinguish differ-
ent semantic values among the valence modifiers or shifters
: a negation will not have to be analysed in the same way as
an intensifier. Moreover, it is important to provide – as an
output – information about how graduating is the attitude
expressed. Indeed, it could be interesting that the agent
do not perform in the same way attitudes like “I like out-
door activities” and “I like very much outdoor activities”.
Finally, the model needs to also deal with multimodality
issue: in order to ensure this, the user’s attitudes could be
also linked to the agent’s non-verbal signal.
6. Conclusion and further work
This paper proposes a model of user attitudes in verbal con-
tent. In order to go beyond the classical positive vs. neg-
ative distinction, this model examines some features as the
source, the target or the attitude type and deals with interac-
tion by integrating information about the illocutionary acts
and the relations between user and agent units. This model
– confronted with the SEMAINE corpus – shows that these
features are relevant. The user’s attitudes have properties
in common with the agent’s attitudes, like the polarity and,
less, the target. In further work, as explained in Section
5, the simplified semantic representation provided by the
model will be refined. By doing so, our model will be a
strong foundation for designing our detection model.
7. Acknowledgment
The authors thank Catherine Pelachaud for valuable in-
sights and suggestions. This work has been supported
by the european collaborative project TARDIS and the
SMART Labex project1.
8. References
Bednarek, M. (2009). Language patterns and attitudes.
Functions of Language, pages 165–192, 16/2.
Bloom, K., Garg, N., and S., A. (2007). Extracting ap-
praisal expressions. HLT-NAACL, pages 165–192, April.
Breck, E., Choi, Y., and Cardie, C. (2007). Identifying
expressions of opinion in context. In Sangal S., M. H.
and K., B. R., editors, International Joint Conference On
Artifical Intelligence, pages 2683–2688, San Francisco,
CA. Morgan KoffMann Publishers.
Campano, S., Durand, J., and Clavel, C. (2014). Compar-
ative analysis of verbal alignment in human-human and
human-agent interactions. In Language Resources and
Evaluation Conference. to appear, May.
Clavel, C., Pelachaud, C., and Ochs, M. (2013). User’s
sentiment analysis in face-to-face human-agent interac-
tions prospects. In Workshop on Affective Social Sig-
nal Computing, Satellite of Interspeech. Association for
Computational Linguistics, August.
Martin, J. R. and White, P. R. (2005). The Language
of Evaluation. Appraisal in English. Macmillan Bas-
ingstoke, London and New York.
McKeown, G., Valstar, M., Cowie, R., Pantic, M., and
Schroder, M. (2011). The semaine database: Annotated
multimodal records of emotionally colored conversations
between a person and a limited agent. IEEE Transac-
tions on Affective Computing, 3(1):5–17, Jan-March.
Neviarouskaya, A., Prendinger, H., and Ishizuka, M.
(2010). Recognition of affect, judgment, and appreci-
ation in text. In Proceedings of the 23rd International
Conference on Computational Linguistics, COLING ’10,
pages 806–814, Stroudsburg, PA, USA. Association for
Computational Linguistics.
Osgood, C., Mai, W. H., and S., M. M. (1975). Cross-
cultural Universals of Affective Meaning. University of
Illinois Press, Urbana.
Pang, B. and Lee, L. (2008). Opinion mining and senti-
ment analysis. Foundations and Trends in Information
Retrieval, 2(1-2):1–135, January.
Schuller, B., Batliner, A., Steidl, S., and Seppi, D. (2011).
Recognising realistic emotions and affect in speech:
State of the art and lessons learnt from the first challenge.
Speech Communication, 53(9-10):1062–1087, Novem-
Searle, J. R. (1976). A classification of illocutionary acts.
Language in society, 5(01):1–23.
Whitelaw C., Garg, N. and Argamon, S. (2005). Using ap-
praisal taxonomies for sentiment analysis. Proceedings
of CIKM-05, the ACM SIGIR Conference on Information
and Knowledge Management, April.
ocher, A. and Mathet, Y. (2012). The glozz platform:
A corpus annotation and mining tool. In Proceedings of
the 2012 ACM Symposium on Document Engineering,
pages 171–180, Paris, France, September.
Wiebe, J., Wilson, T., and Cardie, C. (2005). Annotation
expressions of opinion and emotions in language. Lan-
guage Resources and Evaluation, pages 165–210, Vol.
... - Langlet, C. and Clavel, C. (2014b ...
... Au regard de l'annotation manuelle effectuée sur le corpus Semaine, il est possible d'avoir un premier aperçu de la manière dont l'agent peut influencer les expressions d'attitude de l'utilisateur, et comment ce dernier se place dans une relation de collaboration dialogique avec l'agent. Comme nous avons déjà expliqué (Langlet and Clavel, 2014b), sur un plan pragmatique, la nature de l'acte illocutoire précédant les expressions d'attitude de l'utilisateur est un premier indicateur. En effet, si les actes directifs et expressifs de l'agent ont chacun un nombre significatif d'occurrences dans l'ensemble du corpus annoté, au regard de la relation qui les associe aux attitudes exprimées par l'utilisateur, les actes directifs prévalent. ...
Cette thèse se situe à la croisée de deux domaines de recherche : celui de l'analyse des sentiments et celui des agents conversationnels animés. Les agents conversationnels animés peuvent être définis comme des personnages virtuels ayant la capacité de converser avec un utilisateur humain. Afin d’accroître les compétences communicationnelles de l’agent, il est important que celui-ci soit doté d’une forme d’intelligence socio-émotionnelle. L’agent doit être ainsi en capacité de gérer des signaux socio-émotionnels, tant du côté de la génération que de celui de la détection. Du côté de la génération, de nombreux travaux ont produit des modèles optimisant la production de gestes ou d’expressions faciales pour exprimer soit des émotions soit des attitudes sociales. Du côté de la détection, une majorité des travaux se concentrent sur l’analyse d’indices socio-affectifs non-verbaux (expressions faciales, indices acoustiques). Le contenu verbal et les expressions de sentiment qu’il véhicule restent quant à lui encore partiellement exploité. En effet, les rares études, intégrant un module de détection des sentiments de l’utilisateur dans le cadre de conversations humain-agent, ne prennent pas en compte les spécificités de ce contexte d’interaction. Pour combler cette lacune, notre travail s’intéresse à l’analyse du contenu verbal produit par l’utilisateur et à la manière dont celui-ci réfère ou exprime des sentiments, des affects ou des attitudes. Nous en proposons un modèle de détection au cours d’une interaction multi-modale et en face à face avec un agent conversationnel animé. Pour construire ce modèle, deux questions se sont posées à nous. Dans un premier temps, il nous a fallu identifier, au sein de la vaste classe des expressions de sentiment, celles qui apparaissaient comme les plus pertinentes pour l’élaboration des stratégies de communication de l’agent. Dans un second temps, nous avons dû choisir une méthode devant être non seulement opérante pour une analyse à grain fin de ces expressions, mais également adaptable au contexte conversationnel. Nos contributions s’articulent autour de trois axes. Tout d’abord, nous fournissons une analyse approfondie des expressions de sentiment. Trois unités conversationnelles sont considérées : le tour de parole, la paire adjacente et la séquence thématique. Cette analyse met en évidence un certain nombre de caractéristiques nécessaires au développement d’un ensemble de règles de détection. Ensuite, nous proposons un modèle de détection symbolique intégrant des règles sémantiques et des grammaires formelles. Ce modèle repose sur une analyse ascendante des énoncés – du niveau lexical au niveau phrastique – et se concentre successivement sur trois cadres d’analyse : le tour de parole, la paire adjacente et la séquence thématique. Enfin, nous proposons un protocole d’évaluation pour la validation des règles. Grâce à la création de deux plateformes d’annotation, nous avons pu créer deux jeux d’annotations sur deux corpus différents : un corpus de small-talk et un corpus de négociation. Les performances du système ont ainsi pu être évaluées par rapport aux références obtenues.
... Deep semantic analysis can be carried out considering each utterance independently from the previous ones or considering the whole conversation as a significant context to disambiguate the new inputs (analysis at the pragmatic level). Langlet and Clavel [193] takes a first step towards taking into account the interaction context by analyzing the SEMAINE database of human-agent interactions [194]. In particular, we have investigated how the interaction context can be considered by modeling the speech acts and attitudes of the agent in order to interpret user's sentiment. ...
... The second aspect specifies the use of the interaction context in order to improve the detection of sentiment-related phenomena (see Section 4.5). The model investigated in [193] and introduced in Section 5.1.2 has been implemented in [199] to analyze a specific sentiment-related phenomenon: the user's likes and dislikes. ...
Full-text available
... [20] proposed a classification between positive vs. negative user sentiment as an input of human-agent interaction. [31] provided a model of user's attitudes in verbal content grounded on the model of [32] and dealing with the interaction context: as the previous agent's utterance can trigger or constrain an the user's expression of attitude, its target or its polarity, the model of the semantic and pragmatic features of agent's utterance is used to help the detection of user's attitude. Relying on the joint analysis of the agent's and the user's adjacent utterances, [33] provide a system able to detect user's like and dislike and devoted to the improvement of the social relationships between the agent and the user. ...
Full-text available
p>Embodied conversational agents are capable of carrying aface-to-face interaction with users. Their use is substantially increasing in numerous applications ranging from tutoring systems to ambient assisted living. In such applications, one of the main challenges is to keep the user engaged in the interaction with the agent. The present chapter provides an overview of the scientic issues underlying the engagement paradigm, including a review on methodologies for assessing user engage-ment in human-agent interaction. It presents three studies that have been conducted within the Greta/VIB platforms. These studies aimed at designing engaging agents using dierent interaction strategies (alignmentand dynamical coupling) and the expression of interpersonal attitudes in multi-party interactions.</p
... Nous avons utilisé les annotations faites par Langlet et Clavel (2014, 2015b. Ceci constitue deux sous-ensembles d'annotations différents. ...
La reconnaissance des opinions d'un locuteur dans une interaction orale est une étape cruciale pour améliorer la communication entre un humain et un agent virtuel. Dans cette thèse, nous nous situons dans une problématique de traitement automatique de la parole (TAP) sur les phénomènes d'opinions dans des interactions orales spontanées naturelles. L'analyse d'opinion est une tâche peu souvent abordée en TAP qui se concentrait jusqu'à peu sur les émotions à l'aide du contenu vocal et non verbal. De plus, la plupart des systèmes récents existants n'utilisent pas le contexte interactionnel afin d'analyser les opinions du locuteur. Dans cette thèse, nous nous penchons sur ces sujet. Nous nous situons dans le cadre de la détection automatique en utilisant des modèles d’apprentissage statistiques. Après une étude sur la modélisation de la dynamique de l'opinion par un modèle à états latents à l’intérieur d'un monologue, nous étudions la manière d’intégrer le contexte interactionnel dialogique, et enfin d'intégrer l'audio au texte avec différents types de fusion. Nous avons travaillé sur une base de données de Vlogs au niveau d'un sentiment global, puis sur une base de données d'interactions dyadiques multimodales composée de conversations ouvertes, au niveau du tour de parole et de la paire de tours de parole. Pour finir, nous avons fait annoté une base de données en opinion car les base de données existantes n'étaient pas satisfaisantes vis-à-vis de la tâche abordée, et ne permettaient pas une comparaison claire avec d'autres systèmes à l'état de l'art.A l'aube du changement important porté par l’avènement des méthodes neuronales, nous étudions différents types de représentations: les anciennes représentations construites à la main, rigides mais précises, et les nouvelles représentations apprises de manière statistique, générales et sémantiques. Nous étudions différentes segmentations permettant de prendre en compte le caractère asynchrone de la multi-modalité. Dernièrement, nous utilisons un modèle d'apprentissage à états latents qui peut s'adapter à une base de données de taille restreinte, pour la tâche atypique qu'est l'analyse d'opinion, et nous montrons qu'il permet à la fois une adaptation des descripteurs du domaine écrit au domaine oral, et servir de couche d'attention via son pouvoir de clusterisation. La fusion multimodale complexe n'étant pas bien gérée par le classifieur utilisé, et l'audio étant moins impactant sur l'opinion que le texte, nous étudions différentes méthodes de sélection de paramètres pour résoudre ces problèmes.
... Many studies have analyzed spoken dialog behavior of humans towards virtual agents (Campano et al., 2014;Langlet and Clavel, 2014;Veletsianos, 2012;Robinson et al., 2008;Kopp et al., 2005). These dialogs have been social in nature and any task is largely achieved through conversational means. ...
... the study of the Semaine corpus [36]. The rules make the system able to deal with various sentence structures: either verbal ("I hate being indoors") or adjectival ("Being indoors is very unpleasant"). ...
This paper introduces a knowledge-based system which grounds the detection of the user’s likes and dislikes on the topic structure of the conversation. The targeted study is set in a human-agent interaction with the aim to help the creation of dialogue strategies of an agent based on the user’s interests. In this paper, we first describe the system based on linguistic resources such as lexicons, dependency grammars and dialogue information provided by the dialogue system. Second, we explain how the system merges its outputs at the end of each topic sequence. Finally, we present an evaluation of both the linguistic rules and the merging process. The system enables a better identification of the target of the user’s likes and dislikes and provides a synthetic representation of the user’s interests.
Full-text available
Embodied conversational agents are capable of carrying a face-to-face interaction with users. Their use is substantially increasing in numerous applications ranging from tutoring systems to ambient assisted living. In such applications, one of the main challenges is to keep the user engaged in the interaction with the agent. The present chapter provides an overview of the scientific issues underlying the engagement paradigm, including a review on methodologies for assessing user engagement in human-agent interaction. It presents three studies that have been conducted within the Greta/VIB platforms. These studies aimed at designing engaging agents using different interaction strategies (alignment and dynamical coupling) and the expression of interpersonal attitudes in multi-party interactions.
Full-text available
Affective Computing aims at improving the naturalness of human-computer interactions by integrating the socio-emotional component in the interaction. The use of embodied conversational agents (ECAs) – virtual characters interacting with humans – is a key answer to this issue. On the one hand, the ECA has to take into account the human emotional behaviours and social attitudes. On the other hand, the ECA has to display socio-emotional behaviours with relevance. In this paper, we provide an overview of computational methods used for user’s socio-emotional behaviour analysis and of human-agent interaction strategies by questioning the ambivalent status of surprise. We focus on the computational models and on the methods we use to detect user’s emotion through language and speech processing and present a study investigating the role of surprise in the ECA’s answer.
Full-text available
SEMAINE has created a large audiovisual database as part of an iterative approach to building Sensitive Artificial Listener (SAL) agents that can engage a person in a sustained, emotionally coloured conversation. Data used to build the agents came from interactions between users and an 'operator' simulating a SAL agent, in different configurations: Solid SAL (designed so that operators displayed appropriate non-verbal behaviour) and Semi-automatic SAL (designed so that users' experience approximated interacting with a machine). We then recorded user interactions with the developed system, Automatic SAL, comparing the most communicatively competent version to versions with reduced nonverbal skills. High quality recording was provided by 5 high-resolution, high framerate cameras, and 4 microphones, recorded synchronously. Recordings total 150 participants, for a total of 959 conversations with individual SAL characters, lasting approximately 5 minutes each. Solid SAL recordings are transcribed and extensively annotated: 6-8 raters per clip traced five affective dimensions and 27 associated categories. Other scenarios are labelled on the same pattern, but less fully. Additional information includes FACS annotation on selected extracts, identification of laughs, nods and shakes, and measures of user engagement with the automatic system. The material is available through a web-accessible database.
Conference Paper
Full-text available
The main task we address in our research is classification of text using fine-grained attitude labels. The developed @AM sys- tem relies on the compositionality prin- ciple and a novel approach based on the rules elaborated for semantically distinct verb classes. The evaluation of our me- thod on 1000 sentences, that describe personal experiences, showed promising results: average accuracy on the fine- grained level (14 labels) was 62%, on the middle level (7 labels) - 71%, and on the top level (3 labels) - 88%.
Conference Paper
Full-text available
While traditional information extraction systems have been built to answer questions about facts, subjective information extraction systems will answer questions about feelings and opinions. A crucial step towards this goal is identifying the words and phrases that express opinions in text. Indeed, although much previous work has relied on the identification of opinion expressions for a variety of sentiment-based NLP tasks, none has focused directly on this important supporting task. Moreover, none of the proposed methods for identification of opinion expressions has been evaluated at the task that they were designed to perform. We present an approach for identifying opinion expressions that uses conditional random fields and we evaluate the approach at the expression-level using a standard sentiment corpus. Our approach achieves expression-level performance within 5% of the human interannotator agreement.
This is the first comprehensive account of the Appraisal Framework. The underlying linguistic theory is explained and justified, and the application of this flexible tool, which has been applied to a wide variety of text and discourse analysis issues, is demonstrated throughout by sample text analyses from a range of registers, genres and fields.
Corpus linguistics and Natural Language Processing make it necessary to produce and share reference annotations to which linguistic and computational models can be compared. Creating such resources requires a formal framework supporting description of heterogeneous linguistic objects and structures, appropriate representation formats, and adequate manual annotation tools, making it possible to locate, identify and describe linguistic phenomena in textual documents. The Glozz platform addresses all these needs, and provides a highly versatile corpus annotation tool with advanced visualization, querying and evaluation possibilities.
Interpersonal or evaluative meaning has been described in systemic functional linguistics with the help of appraisal theory (Martin & White 2005), which distinguishes between different types of evaluation. One sub-system of APPRAISAL is ATTITUDE, which is further divided into APPRECIATION, JUDGEMENT and AFFECT. This paper uses corpus-linguistic evidence to investigate how far linguistic patterns support this classification, and whether they can be used as a ‘diagnostic’ for distinguishing types of ATTITUDE (as has been proposed in appraisal theory). It argues that two different aspects of APPRAISAL need to be considered: the kinds of attitudinal lexis (in terms of evaluative standards which are inscribed in this lexis) and the kinds of attitudinal targets or types of attitudinal assessment , and that this distinction has not been sufficiently considered in appraisal theory so far. A preliminary classification of attitudinal lexis is also suggested, and a new sub-category of ATTITUDE proposed (COVERT AFFECT).
There are at least a dozen linguistically significant dimensions of differences between illocutionary acts. Of these, the most important are illocutionary point, direction of fit, and expressed psychological state. These three form the basis of a taxonomy of the fundamental classes of illocutionary acts. The five basic kinds of illocutionary acts are: representatives (or assertives), directives, commissives, expressives, and declarations. Each of these notions is defined. An earlier attempt at constructing a taxonomy by Austin is defective for several reasons, especially in its lack of clear criteria for distinguishing one kind of illocutionary force from another. Paradigm performative verbs in each of the five categories exhibit different syntactical properties. These are explained. (Speech acts, Austin's taxonomy, functions of speech, implications for ethnography and ethnology; English.)
More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its ‘big brothers’ speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive there. In a first part, we address the basic phenomenon reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then shift to automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the first comparative challenge on emotion recognition from speech – the INTERSPEECH 2009 Emotion Challenge, organised by (part of) the authors, including the description of the Challenge’s database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the actual learnt lessons before we finally address the ever-lasting problems and future promising attempts.
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area, of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.