Conference PaperPDF Available

Predicting Collaborative Task Performance Using Graph Interlocutor Acoustic Network in Small Group Interaction

Authors:
Predicting Collaborative Task Performance using Graph Interlocutor Acoustic
Network in Small Group Interaction
Shun-Chang Zhong1,3, Bo-Hao Su1,3, Wei Huang4, Yi-Ching Liu2, Chi-Chun Lee1,3
1Department of Electrical Engineering, National Tsing Hua University
2College of Management, National Taiwan University
3MOST Joint Research Center for AI Technology and All Vista Healthcare
4Gamania Digital Entertainment Co., Ltd. (HQ)
flank03200@gmail.com, cclee@ee.nthu.edu.tw
Abstract
Recent works have demonstrated that the integration of group-
level personality and vocal behaviors can provide enhanced pre-
diction power on task performance for small group interactions.
In this work, we propose that the impact of member personality
for task performance prediction in groups should be explicitly
modeled from both intra and inter-group perspectives. Specif-
ically, we propose a Graph Interlocutor Acoustic Network (G-
IAN) architecture that jointly learns the relationship between
vocal behaviors and personality attributes with intra-group at-
tention and inter-group graph convolutional layer. We evaluate
our proposed G-IAN on two group interaction databases and
achieve 78.4% and 72.2% group performance classification ac-
curacy, which outperforms the baseline model that models vo-
cal behavior only by 14% absolute. Further, our analysis shows
that Agreeableness and Conscientiousness demonstrate a clear
positive impact in our model that leverages the inter-group per-
sonality structure for enhanced task performance prediction.
Index Terms: group interaction, personality, attention mecha-
nism, graph convolutional network
1. Introduction
Small group interaction is a communication unit consists of
three to six members exchanging verbal and non-verbal mes-
sages in an attempt to influence one another during the decision-
making process [1]; this particular type of interaction mech-
anism provides advantages for human to complete intellectu-
ally challenging tasks which often require teamwork to com-
plete. Each group’s ability to complete a given cooperative task
varies not only with their intellectual knowledge but also with
their group-level interactive relationship. Studies of group dy-
namics suggested a group-level influence between personality
and performance may be associated with the match of person-
ality characteristics with group member roles. For example,
groups engaged in a cooperative task perform best when they
are composed of one relatively dominant member and two or
three average- or relatively low-dominance members [2]. Dur-
ing such an interaction, a good performance outcome is com-
monly considered as the result of the right group composition.
Personality plays an important role in affecting the dynam-
ics of the group interaction. It is well-known when assessing
each group member’s contribution to task performance at an in-
dividual level, each member’s own personality shows a signifi-
cant impact. However, it is important to also acknowledge that
the role of these traits within the group when considering it as
a whole may differ, e.g., a conscientious and extroverted team
would be composed of not only conscientious and extroverted
members. In fact, the configuration of personality attributes
have already been conceptually associated with group processes
since the early days of group dynamics research [3, 4]. Combi-
nations of group member personality attributes often form dif-
ferent behavioral dynamics, and it will affect group process and
the quality of group performance with either their collaborative
talk or opinion conflict. Aside from the well-established liter-
ature that intra-group personality composition affects their task
performance, understanding how between-group structures are
similar for a given task further help in analyzing and under-
standing the seemingly heterogeneous group interaction behav-
iors [5].
Recently, computational research has progressed in devel-
oping methods that automatically predict group-level task per-
formance from verbal/non-verbal behaviors during small group
interactions [6, 7], and some research has started to investigate
joint modeling approach in considering the intertwining effect
between member’s vocal behaviors and intra-group personality
compositions [8, 9]. Where these past research has laid the solid
foundation in predicting group performances using vocal behav-
iors by jointly modeling the effect of intra-group personality
composition, these works do not leverage the inter-group per-
sonality structures into consideration. It should be intuitive that
groups with similar group personality compositions should have
more correlated performance outcome; the ability to explicitly
exploit this inter-group personality structural dependency could
lead to a better prediction performances. Thus, in this work, we
propose a Graph Acoustic Interlocutor Network (G-IAN) which
models not only the intertwining effect between acoustic behav-
iors and personality for each group, and further represent the
inter-group relationship using a group-based personality graph
structure that is imposed on the acoustic representations in pre-
dicting task performances.
Specifically, our proposed G-IAN architecture predicts
group-level performance on two datasets, the NTULP and the
Gamania Group Interactive Database (GGID), consists of face-
to-face collaborative small group interactions using acoustic
features as inputs. The inter-group personality structure is en-
coded with the inspiration from the successful use of graph
convolutional network (GCN) [10] in applications such as so-
cial network [10], traffic problem[11], or disease prediction[12].
Our proposed G-IAN considers both intra-group and inter-
group effects of personality on vocal behavior jointly: the intra-
group personality effect on behavior is modeled by applying
personality control attention mechanism and the inter-group
personality effect is represented using GCN with the adjacency
matrix obtained from group-level personality characterization.
Our result shows that G-IAN achieves promising accuracy of
Copyright © 2020 ISCA
INTERSPEECH 2020
October 25–29, 2020, Shanghai, China
http://dx.doi.org/10.21437/Interspeech.2020-16983122
Figure 1: A complete schematic of our Graph Interlocutor Acoustic Network (G-IAN). It applies modified attention mechanism con-
trolled by group-level personality, and models the inter-group relationship of personality with a graph convolutional layer for the
recognition task.
78.4% and 72.2% unweighted recall rate (UAR) in classify-
ing group performances on the NTULP and the GGID, respec-
tively. Further, our analysis reveals that Agreeableness and
Concentiousness constitutes the two key factors in linking be-
tween group’s vocal representations for improved performance
prediction accuracy.
2. Research Methodology
2.1. Dasaset
2.1.1. The NTULP Audio-Video Database
The NTULP dataset includes 97 interaction sessions of peo-
ple engaged in solving a collaborative school policy task [13].
There are three participants in the interaction, and each takes on
a different roles randomly: vice president of university, vice
president of business school, and a member of the business
school teachers committee. The goal of the task is to come up
with a solution for a pre-designed issue, and the participants
should communicate and discuss the information on their own
collaboratively. In addition to audio and video recording of each
session, the NTULP contains meta data: age, gender, individual
Big5 personality trait and group performance score.
Personality. Participants were administered the five 10-item
scales that measure the Big-Five personalities[14]. They were
asked to assess how accurately each statement described them
on a 5-point scale, with anchors of 1 = very inaccurate and 5 =
very accurate.
Group performance. The task performance of each group was
evaluated by two trained research assistants according to the
scoring manual for the task developed by Wheeler and Men-
neck (1992) [13]. It consists of over 300 possible solutions with
two scores: a problem-solving score for how well the solution
solved the problem, and a feasibility score to how feasible a so-
lution is to the problem. Two assistants both coded all the 97
groups privately by finding the best match between the groups
solutions and the potential solutions listed in the manual. When
scoring disagreements, they would discuss with the third coder,
one of the researchers, and the reconciled score were be used in
the analysis.
2.1.2. The Gamania Group Interactive Database
The Gamania Group Interactive Database (GGID) is a novel
and innovative group interaction corpus proprietary collected
by Gamania Digital Entertainment Company in Taiwan. Each
session includes a four persons interaction, where the partici-
pants jointly engage in a collaborative board game. Participants
were asked engage in a puzzle game to rearrange the facilities
in four routes in order to reconnect the corresponded entrances
to exits. Each group will receive game rewards at the group-
level depending on how well they solve this four-route puzzle
in time, or receive a game punishment if they fail to solve it.
The GGID includes 31 sessions with 124 subjects totally
(age ranges from 21 to 55 years old, 50 males and 74 females).
The audios and videos recorded in the database by using one
panoramic camera and four separate wireless directional micro-
phones. The database also contains the following meta data:
age and gender, individual personality traits, group scores. Ta-
ble 1 shows the classes distribution of the binraized group per-
formance scores (BGP) provided by the Gamania Group.
Personality. Each participants personality attribute, i.e., Ex-
traversion, Agreeableness, Conscientiousness, Neuroticism and
Imagination, is measured by IPIP-15 scale with 15 items [15,
16]. The scale is translated and modified from the original IPIP
scale with 50 items[16].
Group Performance. The binarized task performance of each
team is determined by the task completion distribution of all
groups in this board game. The groups completing over 60% of
the four-route puzzle game are defined as the high performing
group, while those completing below 60% are defined as the
low performing group.
2.2. Graph Interlocutor Acoustic Network (G-IAN)
Figure 1 shows our proposed Graph Interlocutor Acoustic Net-
work (G-IAN) framework. We model two participants (since
the third participant is also a pre-set examiner) in NTULP
database and four participants in the GGID within each session.
Specifically, for each of the i-th group, our model uses Bi-GRU
with attention mechanism trained on acoustic inputs to form the
acoustic embedding, xm, of the m-th interlocutor in the group,
Table 1: It shows the classes distribution of the binarized group
performance (BGP).
Dataset Low(0) High(1) Total
NTULP 36 61 97
GGID 18 13 31
3123
where m= 1,..., M(with the number of group members of M).
2.2.1. Speaker Acoustic Features
Firstly, the audio wav files in the NTULP and the GGID are
segmented into speaker utterances using an automated voice ac-
tivity detector (VAD). We extract the sentence-level extended
Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) on
each of the target speaker as acoustic features [17] using the
openSMILE tookit [18]. It computes 88 dimensional features
including statistical properties of mel-frequency cepstral coeffi-
cients (MFCCs), associated delta, and prosodic information.
2.2.2. Speaker Personality Features
An intuitive method to model the group-level personality is to
compute personality statistics within each group. The group-
level personality features are obtained by computing statistics:
maximum, minimum, mean and standard deviation value (dif-
ference value for NTULP dataset) within the group. Each mem-
ber has 5 personality scores, and we derive a 20-dimensional
features, Pgroup , as group-level personality inputs. Addition-
ally, we retain each member’s raw personality attributes as the
individual-level personality inputs, Pindiv. Then the compos-
ite personality, Pall, is derived by concatenating Pgroup and
Pindiv.
2.2.3. Personality Graph
The adjacency matrix, A, is built to represent the inter-group
personality relationship between groups. A, a symmetric matrix
of size N×N, is defined as following:
Aij =1,if i=j
cov(Pi, Pj),otherwise.
where Piis the personality inputs ( Pindiv,Pg roup or Pall);
cov(Pi, Pj)indicates the covariance value computed between
group iand group jon the personality attributes. Intuitively,
this personality graph models the inter-group personality rela-
tionship, i.e., how similar the personality of the group is to one
another. It connects two groups with a larger edge weight when
their personality composition are similar.
2.2.4. Graph Interlocutor Acoustic Network
For each session, we first rank and label participants according
to their speak times from the most to the least, e.g., we assign
the interlocutors as either talkative or talk-less subject in the
NTULP database. We train a Bi-GRU for each subject with
personality re-weighted attention mechanism as in our previous
work[9], defined as:
αt=exp[(uTyt)]
Ptexp(uTyt)(1)
α0
t=αt+ctrlt(2)
ctrli×t=Pi×D×WD×t(3)
where α0
tis the personality re-weighted attention weight, αt
is the self attention weight, ytis the hidden layer of time step
tand ctrltindicates the personality control weight controlled
by the group composite personality inputs mentioned in section
2.2.2. Dis the dimension of the personality attributes and Wis
a trainable matrix. zis the reweighed hidden layer. We consider
a 1-layer GCN with the following layer-wise propagation rule:
H(l+1) =σ(AH(l)W(l))(4)
Here, Ais the adjacency matrix calculated in Section 2.2.3;
W(l)is a layer-specific trainable weight matrix; σis the activa-
tion function (we use ReLU here); H(l)is the matrix of activa-
tions in the lth layer; Then, we can pass zinto the GCN layer.
z0=f(z, A) = (AReLU (Az W (0))W(1))(5)
After obtaining z0, we then concatenate the meta attributes (age
& gender) to z0and feed it into the prediction layer including
five fully-connected layers to perform binary classification. All
of the parameters of our G-IAN are updated batch-wised by us-
ing cross entropy loss function with L2 regularization term to
prevent overfitting.
3. Experiment Setup and Results
3.1. Experiment Setup
We evaluate G-IAN in both the NTULP and the GGID using
5-folds cross validation using the metric of unweighted aver-
age recall (UAR). In this section we compare different methods,
model parameters, and our evaluation scheme.
3.1.1. Model Comparison
Bi-GRU+ATT-Vocal Behavior Only
Training a typical Bi-GRU for each subject with attention to
perform recognition directly.
Personality Network (PN)-Vocal Personality Only
Using the PN model in our previous work[9], which uses 5-layer
DNN on personality composite features to perform recognition.
Bi-GRU+ATT+CTRL-Vocal Behavior+Intra-Group Effect
Integrating the personality control mechanism to the Bi-GRU
attention weight to perform recognition.
G-IAN without CTRL-Vocal Behavior+Inter-Group Effect
Using our proposed architecture without personality control
mechanism to perform recognition.
Graph Interlocutor Acoustic Network (G-IAN)
Using our proposed architecture to perform recognition.
3.1.2. Model Parameters
Our proposed G-IAN is trained with same parameters on the
NTULP and the GGID. For the NTULP, the number of the
hidden nodes in the Bi-GRU is 10, the number of the hid-
den nodes in the GCN layer is 20. The only difference is
the node size of the prediction layer which is composed of 5
fully-connected layers: [44,50,50,32,16,2] for the NTULP and
[82,50,50,32,16,2] for the GGID. We use ReLU activation func-
tion and batch normalization for the fourth layer. The model is
optimized using ADAM optimizer with learning rate equals to
0.0005, batchsize equals to 16 and the λregularization term
equals to 0.007.
3.2. Result and Analysis
3.2.1. Analysis on Model Performance
Table 2 shows the complete prediction results. Our proposed
G-IAN obtains the best overall performance at UAR (78.4%
on the GGID and 72.2% on the NTULP) group performance
classification task. Our method also outperforms the baseline
model by 14.3% on GGID and 14.1% UAR on the NTULP ab-
solutely. The baseline model is a standard Bi-GRU architecture
with attention mechanism, which only consider subjects’ vocal
behavior. The PN model uses a 5-layer DNN with group-level
personality as input only to perform task performance recogni-
tion. The accuracy obtained with these two baseline models (vo-
cal behavior only, personality attributes only) are around 64%.
3124
Table 2: It shows a comparison of model performance using the metric of unweighted average recall (UAR). The overall result show
that the G-IAN outperforms all other methods in group performance classification task achieving 72.2% UAR in NTULP and 78.4%
UAR in Gamania.
NTULP Gamania
Type Adj Low UAR High UAR UAR Low UAR High UAR UAR
Bi-GRU+ATT (Baseline) 55.6 60.7 58.1 66.7 61.5 64.1
indiv 72.2 50.8 61.5 55.6 61.5 58.5
group 58.3 68.5 63.6 61.1 61.5 65.3PN
all 72.2 55.7 64.0 66.7 61.5 64.1
indiv 55.6 62.3 58.9 66.7 61.5 64.1
group 61.1 59.0 60.1 72.2 61.5 66.9Bi-GRU+ATT+Control
all 63.9 67.2 65.6 66.7 69.2 67.9
indiv 63.9 68.9 66.4 66.7 61.5 64.1
group 63.9 65.6 64.7 66.7 69.2 67.9G-IAN without Control
all 63.9 70.4 67.2 72.2 69.2 70.7
indiv 63.9 68.9 66.4 72.2 61.5 64.1
group 69.4 62.3 65.9 72.2 76.9 74.6G-IAN
all 72.2 72.1 72.2 72.2 84.6 78.4
We find that neither baseline model nor PN has sufficient pre-
dictive power of the group performance, and by modeling the
intra-group personality effect on vocal behavior using person-
ality control attention mechanism, i.e., Bi-GRU+ATT+Control,
it increases slightly to around 66% and 68% on the NTULP
and the GGID respectively. Additionally, we compare different
types of personality representations (Pindiv,Pg roup and Pall
as mentioned in section 2.2.2 and 2.2.3), the model with the
personality feature Pall which incorporates Pindiv and Pgroup
obtains the best prediction effect among these three personality
representations.
Generally, the prediction results of the G-IAN that takes
into account of the inter-group personality effect improves 2-3%
over Bi-GRU+ATT+Control without the graph convolutional
structure. By jointly modeling the intra-group and inter-group
personality effect on vocal behavior, i.e., our proposed G-IAN
with personality control attention mechanism, it achieves the
best performing model of 72.2% on the NTULP and 78.8% on
the GGID. Our G-IAN also outperforms the results obtained in
our previous work [9] (evaluated only on the NTULP) which
only models the intra-group personality effect and acoustic be-
havior. In summary, the results indicate the need of taking into
account of both inter-group and intra-group effects of personal-
ity, which are shown to be beneficial in this group performance
predicting task.
3.2.2. Analysis of Personality Graph
Our experiments demonstrate that modeling intra-group and
inter-group personality effects help improve the overall predic-
tion accuracy. We would like to further analyze the differences
in the impact of different personality composition on this graph
structure. Specifically, we quantify the graph structure using
clustering analysis and connectivity analysis. We would like
to investigate which personality trait has the greatest impact on
the graph structure; we will perform the following graph analy-
Table 3: Clustering analysis and connectivity analysis of the
personality graph with different personality composition. The
nodes includes the sessions in Fold 3 in NTULP (78 nodes) and
Fold 2 in the GGID (24 nodes).
GGID NTULP
Removing Attribute C CNCEC CNCE
None 0.139 17 18 0.465 78 78
Extraversion 0.124 16 18 0.461 78 78
Agreeableness 0.143 16 17 0.473 77 77
Conscientiousness 0.147 15 17 0.482 77 76
Neuroticism 0.144 17 17 0.447 78 78
Openness/Imagination 0.136 17 19 0.391 78 78
sis by removing each of the five personality attribute to examine
the changes in the structure.
We evaluate the clustering level of the graphic structure by
calculating the average clustering coefficient,C, defined as fol-
lows:
Ci=2|ejk |
ki(ki1) (6)
C=1
Nnodes X
v
Cv(7)
where kiis the number of neighbours of a vertex, vj, vk
G, ejk E.Ciis the fraction of pairs of nodes, that are
neighbors of a given node v, that are connected to each other by
edges, and Cis the average of all local coefficients. The higher
the clustering coefficient, the denser the graph. For connectivity
analysis, we calculate the node connectivity (CN) and edge con-
nectivity (CE), which is the minimum number of nodes/edges
need to be removed in order to split the network.
Table 3 shows the changes in our personality graph of Aall
in the training set of the GGID and the NTULP as it removes
each personality attribute. We observe that Agreeableness and
Conscientiousness are the two attributes that make the graph
more clustered and reduce the connectivity. In contrast, the dis-
tance between groups will become further apart if these two
attributes are missing which causes the graph to be more diver-
gent and makes it more difficult for our model leverage similar
patterns between groups.
4. Conclusions
In small group collaborative task, the ability to automatically
predict task performance from vocal cues is not only related to
intra-group effect of personality composition but also the inter-
group relationship of personality composition. In this work,
we proposed a Graph Interlocutor Acoustic Network that not
only integrates the intra-group effect of personality attributes
into acoustic behaviors as attention mechanism, but also mod-
els the inter-group personality relationship as a graphical struc-
ture with GCN. We obtain a competitive prediction accuracy
of group performance on the NTULP (72.2%) and the GGID
(78.4%) datasets. In summary, the combination of time-series
model and graph-based deep learning network provides a novel
approach in studying the personality effect and the speech dy-
namics within group. We will continue to investigate the effect
of the personality traits and advance our technical framework
so that it can adapt to the more complex conversational envi-
ronment.
3125
5. References
[1] S. Tubbs, A systems approach to small group
interaction. McGraw-Hill, 1995. [Online]. Available:
https://books.google.com.tw/books?id=PCjTVXIIPI0C
[2] E. E. Ghiselli and T. M. Lodahl, “Patterns of managerial traits and
group effectiveness.The Journal of Abnormal and Social Psy-
chology, vol. 57, no. 1, p. 61, 1958.
[3] W. Haythorn, “The influence of individual members on the char-
acteristics of small groups.” The Journal of Abnormal and Social
Psychology, vol. 48, no. 2, p. 276, 1953.
[4] B. Barry and G. L. Stewart, “Composition, process, and perfor-
mance in self-managed groups: The role of personality.” Journal
of Applied psychology, vol. 82, no. 1, p. 62, 1997.
[5] M. Kompan and M. Bielikov´
a, “Social structure and personality
enhanced group recommendation.” in UMAP Workshops, 2014.
[6] G. Murray and C. Oertel, “Predicting group performance in task-
based interaction,” in Proceedings of the 20th ACM International
Conference on Multimodal Interaction, 2018, pp. 14–20.
[7] U. Kubasova, G. Murray, and M. Braley, “Analyzing verbal
and nonverbal features for predicting group performance,arXiv
preprint arXiv:1907.01369, 2019.
[8] Y.-S. Lin and C.-C. Lee, “Using interlocutor-modulated attention
blstm to predict personality traits in small group interaction,” in
Proceedings of the 20th ACM International Conference on Multi-
modal Interaction, 2018, pp. 163–169.
[9] S.-C. Zhong, Y.-S. Lin, C.-M. Chang, Y.-C. Liu, and C.-C. Lee,
“Predicting group performances using a personality composite-
network architecture during collaborative task,Proc. Interspeech
2019, pp. 1676–1680, 2019.
[10] T. N. Kipf and M. Welling, “Semi-supervised classification with
graph convolutional networks, arXiv preprint arXiv:1609.02907,
2016.
[11] X. Geng, Y. Li, L. Wang, L. Zhang, Q. Yang, J. Ye, and Y. Liu,
“Spatiotemporal multi-graph convolution network for ride-hailing
demand forecasting,” in Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 33, 2019, pp. 3656–3663.
[12] S. Parisot, S. I. Ktena, E. Ferrante, M. Lee, R. Guerrero,
B. Glocker, and D. Rueckert, “Disease prediction using graph
convolutional networks: application to autism spectrum disor-
der and alzheimers disease,” Medical image analysis, vol. 48, pp.
117–130, 2018.
[13] B. C. Wheeler and B. E. Mennecke, “The school of business pol-
icy task manual: Working paper# 92-524c,” 1992.
[14] L. R. Goldberg, “The development of markers for the big-five fac-
tor structure.” Psychological assessment, vol. 4, no. 1, p. 26, 1992.
[15] L. Zheng, L. R. Goldberg, Y. Zheng, Y. Zhao, Y. Tang, and L. Liu,
“Reliability and concurrent validation of the ipip big-five fac-
tor markers in china: Consistencies in factor structure between
internet-obtained heterosexual and homosexual samples,Person-
ality and individual differences, vol. 45, no. 7, pp. 649–654, 2008.
[16] R.-H. Li and Y.-C. Chen, “The development of a shortened version
of ipip big five personality scale and the testing of its measurement
invariance between middle-aged and older people,Journal of Ed-
ucational Research and Development, vol. 12, no. 4, pp. 87–119,
2016.
[17] F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. Andr´
e,
C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan
et al., “The geneva minimalistic acoustic parameter set (gemaps)
for voice research and affective computing,IEEE transactions on
affective computing, vol. 7, no. 2, pp. 190–202, 2015.
[18] F. Eyben, M. W ¨
ollmer, and B. Schuller, “Opensmile: the mu-
nich versatile and fast open-source audio feature extractor, in
Proceedings of the 18th ACM international conference on Mul-
timedia, 2010, pp. 1459–1462.
3126
... Batch size is fixed as [16,32], the max epoch is 1000, and optimizer is ADAMAX [29]. Additionally, we follow [65,66], which are the closest studies to us, to use an unweighted average recall (UAR) as our final evaluation metric. Zhong et al. [65,66] modeled the group-level personality composition for group performance classification. ...
... Additionally, we follow [65,66], which are the closest studies to us, to use an unweighted average recall (UAR) as our final evaluation metric. Zhong et al. [65,66] modeled the group-level personality composition for group performance classification. Finally, the whole framework is implemented using the Pytorch toolkit [49]. ...
Conference Paper
Full-text available
Physiological synchrony is a particular phenomenon of physiological responses during a face-face conversation. However, while many previous studies had proposed various physiological synchrony measures between interlocutors in dyadic conversations, there are very few works on computing physiological synchrony in small groups (three or more people). Besides, belongingness and satisfaction are two important factors for the human to decide which group they want to stay. Therefore, in this preliminary work, we want to investigate and reveal the relationship between physiological synchrony and belongingness/satisfaction under group conversation. We feed the physiology of group members into a designed learnable graph structure with the group-level physiological synchrony and heart-related features computed from Photoplethysmography (PPG) signals. We then devise a Group-modulated Attentive Bi-directional Long Short-Term Memory (GGA-BLSTM) model to recognize three-levels of belongingness and satisfaction (low, middle, and high) in groups. Finally, we evaluate the proposed method on our recently collected multimodal group interaction corpus (never published before), NTUBA, and the results show that (1) the models trained jointly with the group-level physiological synchrony and the conventional heart-related features consistently outperforms the model only trained with the conventional features, and (2) the proposed model with a Graph-structure Group-modulated Attention mechanism (GGA), GGA-BLSTM, performs better than the strong baseline model, the attentive BLSTM. Finally, the GGA-BLSTM achieves a promising unweighted average recall (UAR) of 73.3% and 82.1% on group satisfaction and belongingness classification tasks respectively. In further analyses, we reveal the relationships between physiological synchrony and group satisfaction/belongingness.
... The effects of conscientiousness and agreeableness are two-fold. On one hand, they contribute positively to the performance of tasks in stable environments [37]. On the other hand, these traits may impede individuals and organizations from engaging in creative activities and negatively affect their responses to dynamic and evolving environments [38,39]. ...
Article
Full-text available
Formalistic tasks are widely utilized in modern companies due to their ability to increase productivity and contribute to the achievement of corporate goals at a lower cost. However, these tasks are often meet with resistance from individuals because they do not provide direct short-term rewards for their efforts. Drawing on social cognitive theory, this study examined the influence of individual quality and organizational attachment on the completion of formalistic tasks. To address this, the study conducted a questionnaire survey to collect data from 602 Chinese respondents and built a structural equation model for data analysis. Through empirical research, the study confirmed the positive role of individual quality, including knowledge and personality, in the completion of formalistic tasks. Furthermore, the study proved that avoidant attachment could significantly weaken the effect of some components of individual quality on formalistic task completion. This paper is the first to reveal the influence of individual and environmental factors on individuals’ completion of formalistic tasks, progressing from bottom to top. The implications of these results are discussed.
... This research has identified a variety of factors found to predict team-level task performance [29]. One prominent research stream focuses on evaluating social signals, such as proxemic and paralinguistic behavior, during team interactions [19,22,41]. However, few studies have investigated the association between different modes and social signals on performance. ...
Conference Paper
Full-text available
Collaborative creativity is an essential part of modern teamwork and is often supported by formal techniques, such as design thinking. Current support tools are often limited in scope as understanding the time-varying nature and structure of team communication is insufficient. We investigate how collaborative creative activities in new product development teams can be digitally supported while maintaining face-to-face communication. This work analyzes to what extent paralinguistic and proxemic features of team interaction relate to performance in new product development teams and if and how this relationship differs for different stages in the design process. This is investigated by applying multilevel modeling on data collected during a four-week new product development cycle. The cycle was completed by four teams, during which data were collected automatically using sociometric badges that capture social signals of team interactions. In addition, the data are combined with survey-based measurements on the team’s daily design process and periodic performance evaluations. The current paper provides evidence that social signals are related to team performance and that this relationship varies across the stages in the product design process. Certain social signals contribute positively in one stage but less in other stages, showing the importance of using multimodal signals when modeling high-level collaborative patterns. This research contributes to the literature by providing a better understanding of relevant factors when designing supporting tools or methods for collaborative creative problem solving.
Article
Full-text available
Graphs are widely used as a natural framework that captures interactions between individual elements represented as nodes in a graph. In medical applications, specifically, nodes can represent individuals within a potentially large population (patients or healthy controls) accompanied by a set of features, while the graph edges incorporate associations between subjects in an intuitive manner. This representation allows to incorporate the wealth of imaging and non-imaging information as well as individual subject features simultaneously in disease classification tasks. Previous graph-based approaches for supervised or unsupervised learning in the context of disease prediction solely focus on pairwise similarities between subjects, disregarding individual characteristics and features, or rather rely on subject-specific imaging feature vectors and fail to model interactions between them. In this paper, we present a thorough evaluation of a generic framework that leverages both imaging and non-imaging information and can be used for brain analysis in large populations. This framework exploits Graph Convolutional Networks (GCNs) and involves representing populations as a sparse graph, where its nodes are associated with imaging-based feature vectors, while phenotypic information is integrated as edge weights. The extensive evaluation explores the effect of each individual component of this framework on disease prediction performance and further compares it to different baselines. The framework performance is tested on two large datasets with diverse underlying data, ABIDE and ADNI, for the prediction of Autism Spectrum Disorder and conversion to Alzheimer's disease, respectively. Our analysis shows that our novel framework can improve over state-of-the-art results on both databases, with 70.4% classification accuracy for ABIDE and 80.0% for ADNI.
Article
Full-text available
Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.
Conference Paper
Full-text available
The social aspects of the group members are usually omitted in today's group recommenders. In this paper we propose novel approach for the intergroup processes modeling, while the friendship type, user's personality and the group context is considered in order to reflect the group member influence. Moreover, the bi-directional emotional contagion is reflected by the spreading activation over the influence graph. In this manner we are able to compute adjusted ratings for recommended items which reflect and simulates the real users' preferences.
Article
Full-text available
To satisfy the need in personality research for factorially univocal measures of each of the 5 domains that subsume most English-language terms for personality traits, new sets of Big-Five factor markers were investigated. In studies of adjective-anchored bipolar rating scales, a transparent format was found to produce factor markers that were more univocal than the same scales administered in the traditional format. Nonetheless, even the transparent bipolar scales proved less robust as factor markers than did parallel sets of adjectives administered in unipolar format. A set of 100 unipolar terms proved to be highly robust across quite diverse samples of self and peer descriptions. These new markers were compared with previously developed ones based on far larger sets of trait adjectives, as well as with the scales from the NEO and Hogan personality inventories. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Conference Paper
Full-text available
We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.
Conference Paper
Small group interaction occurs often in workplace and education settings. Its dynamic progression is an essential factor in dictating the final group performance outcomes. The personality of each individual within the group is reflected in his/her interpersonal behaviors with other members of the group as they engage in these task-oriented interactions. In this work, we propose an interlocutor-modulated attention BSLTM (IM-aBLSTM) architecture that models an individual's vocal behaviors during small group interactions in order to automatically infer his/her personality traits. The interlocutor-modulated attention mechanism jointly optimize the relevant interpersonal vocal behaviors of other members of group during interactions. In specifics, we evaluate our proposed IM-aBLSTM in one of the largest small group interaction database, the ELEA corpus. Our framework achieves a promising unweighted recall accuracy of 87.9% in ten different binary personality trait prediction tasks, which outperforms the best results previously reported on the same database by 10.4% absolute. Finally, by analyzing the interpersonal vocal behaviors in the region of high attention weights, we observe several distinct intra- and inter-personal vocal behavior patterns that vary as a function of personality traits.
Conference Paper
We address the problem of automatically predicting group performance on a task, using multimodal features derived from the group conversation. These include acoustic features extracted from the speech signal, and linguistic features derived from the conversation transcripts. Because much work on social signal processing has focused on nonverbal features such as voice prosody and gestures, we explicitly investigate whether features of linguistic content are useful for predicting group performance. The conclusion is that the best-performing models utilize both linguistic and acoustic features, and that linguistic features alone can also yield good performance on this task. Because there is a relatively small amount of task data available, we present experimental approaches using domain adaptation and a simple data augmentation method, both of which yield drastic improvements in predictive performance, compared with a target-only model.