Conference PaperPDF Available

Predicting Group Performances Using a Personality Composite-Network Architecture During Collaborative Task

Authors:
Predicting Group Performances using a Personality Composite-Network
Architecture during Collaborative Task
Shun-Chang Zhong1,3, Yun-Shao Lin1,3, Chun-Min Chang1,3, Yi-Ching Liu2, Chi-Chun Lee1,3
1Department of Electrical Engineering, National Tsing Hua University, Taiwan
2College of Management, National Taiwan University, Taiwan
3MOST Joint Research Center for AI Technology and All Vista Healthcare, Taiwan
flank03200@gmail.com, astanley18074@gmail.com, cmchang@gapp.nthu.edu.tw,
yichingliu@ntu.edu.tw, cclee@ee.nthu.edu.tw
Abstract
Personality has not only been studied at an individual level,
its composite effect between team members has also been in-
dicated to be related to the overall group performance. In
this work, we propose a Personality Composite-Network (P-
CompN) architecture that models the group-level personality
composition with its intertwining effect being integrated into
the network modeling of team members vocal behaviors in
order to predict the group performances during collaborative
problem solving tasks. In specific, we evaluate our proposed
P-CompN in a large-scale dataset consist of three-person small
group interactions. Our framework achieves a promising group
performance classification accuracy of 70.0%, which outper-
forms baseline model of using only vocal behaviors without
personality attributes by 14.4% absolutely. Our analysis further
indicates that our proposed personality composite network im-
pacts the vocal behavior models more significantly on the high
performing groups versus the low performing groups.
Index Terms: group interaction, personality traits, attention
mechanism, social signal processing
1. Introduction
Small group, which includes three to six people, is the most
common composite unit in forming a decision in workplaces.
Group scholars have been trying to understand what are the
right ingredients of the group members that would lead to a
better (effective) decision-making process collectively. Group
composition is the configuration of member attributes in a team,
and the composition of personality has been indicated to have a
direct effect on the group performance. In 2007, Bell’s meta
study shows that each of the Big-5 personality traits can di-
rectly impact team’s performance [1]. Furthermore, studies
have shown that it is more than the individual team member’s
personality that has an effect, the various configuration of per-
sonality attributes between members would bring a different im-
pact to the group [2, 3]. For example, team-level average of both
‘openness to experience’ and ‘emotional stability’ moderate the
relationship between team conflict and team performance [4];
the variability on ‘agreeableness’ and ‘neuroticism’ are nega-
tively related to the team’s oral presentation performance [5].
The right composition of team members is not only evident
in their group-level personality composition but also manifested
in ‘how each interacts with one another behaviorally’ during
a small group interaction. The behavioral patterns observed
at the group-level during each interaction session are uniquely
formed through time as each individual member expresses one-
self, exchanges ideas, gears toward consensus or conflictual sit-
uations [6, 7, 8, 9]. In fact, engineering researchers have al-
ready made extensive effort into computationally understanding
these different interaction processes through automated analy-
ses of verbal and non-verbal behaviors. For example, Okada
et al. computed co-occurrences of non-verbal behaviors of each
participant with one another to model the group impressions
[10]; Fang et al. designed intra- and inter-personal audio-video
behavior features to perform personality classification during
small group interactions [11]; Batrinca et al. recognized per-
sonality attributes using acoustic and visual nonverbal features
[12]. Most recently, Lin et al. proposed an interlocutor modu-
lated attention network to reach the state-of-the-art personality
recognition accuracy in small group interaction[13].
In this work, we focus on automatically predict group-level
performance during a three-person interaction on a collabora-
tive school policy task. There is very limited research on pre-
dicting group performance in the past. For example, Murray
et al. developed hand-crafted multimodal behavior features of
small group conversation to predict team performance on a col-
laborative task [14]; Avci et al. computed a large set of features
including nonverbal multimodal cues, personality traits, diverse
interpersonal perception to predict group performances [15].
While Avci et al. integrated personality attributes to demon-
strate an improved accuracy, their framework modeled person-
ality attribute simply as auxiliary independent inputs without
considering the intertwining effect of group composite person-
ality with individual members behaviors. In fact, group person-
ality is known to influence team performances in two ways, i.e.,
as an input factor that can increase or decrease the group’s over-
all resources and a modulating factor that shapes teamwork pro-
cesses [16]. Developing sophisticated frameworks that model
the intertwining effect is crucial in advancing the automated be-
havior modeling in small group interactions.
In this work, we propose a Personality Composite-Network
(P-CompN) architecture to predict group performance on a
large-scale dataset including 97 sessions of three-person inter-
actions. The P-CompN includes a fusion of two major net-
works, i.e., Interlocutor Acoustic Network (IAN) and Personal-
ity Network (PN). The PN network predicts team performance
based on the group-composite Big-5 personality attributes. The
IAN network is trained using bi-directional long short-term
memory network (BLSTM) with attention being modulated
by the group-composite personality. Our P-CompN considers
both group-level personality configuration and team member’s
acoustic behaviors with their intertwining effect being jointly
modeled. P-CompN achieves a promising unweighted aver-
age recall (UAR) of 70.0% in classifying group performances.
Our analysis further reveals that group-composite personality
Copyright © 2019 ISCA
INTERSPEECH 2019
September 15–19, 2019, Graz, Austria
http://dx.doi.org/10.21437/Interspeech.2019-20871676
Figure 1: A complete schematic of our Personality Composite-Network (P-CompN). It includes an Interlocutor Acoustic Network (IAN)
and a Personality Network (PN) with a decision-level fusion for the classification task. Specifically, we propose to learn a personality
composite control weight to modify the original BLSTMs attention mechanism that models the effect of group-level personality attributes
on participants acoustic behaviors jointly.
attributes alters significantly the IAN’s attention weights be-
tween the high versus the low performing groups.
2. Research Methodology
2.1. The NTHULP Audio-Video Database
Our NTHULP audio-video database is collected at the College
of Management of the National Taiwan University (NTU). Each
recording includes a session of three participants engaged in
a collaborative school policy task [17]. The three participants
play different roles chosen at random: vice president of uni-
versity, vice president of business school, and a member of the
business school teachers committee. They are asked to carry
out a task to solve school problems by discussing potential al-
teration on the school policy. Each of the participants would be
given a piece of relevant information that is different from oth-
ers in the team, and they are asked to work together by sharing
ideas and communicating collaboratively. However, one of the
three participants is a sleeper cell assigned by the experimental
personnel. While the sleeper cell knows about all the detailed
information to complete the task, he/she would only take part in
the task passively. The goal of this task is to study the interac-
tion of the other two participants to understand how they may
be influenced by the sleeper cell’s unresponsive behaviors and
its effect on the outcome of this collaborative task.
The NTHULP contains 97 recorded sessions with 194 sub-
jects total (age ranges from 19 to 51 years old, 95 males and 99
females). It includes audio and video recordings collected us-
ing two cameras and three separate wireless lapel microphones.
Additionally, the database contains the following metadata: in-
dividual personality trait and group performance outcome score.
Figure 2: A histogram of the group performance score.
Personality. Each participant’s Big-5 personality attribute, i.e.,
Extraversion, Agreeableness, Conscientiousness, Neu-
roticism and Openness, is measured using the Goldbergs
(1992) 10-item scale [18]. Participants are asked to eval-
uate how accurately each statement described to them on
a 5-point scale, with anchors of 1 = “very inaccurate”
and 5 = “very accurate”.
Group Performance. The performance of each team is evalu-
ated by two trained research assistants using the scoring
manual for the school policy task developed by Wheeler
and Menneck [17]. The scoring manual includes over
300 possible solution scores to this task scenario. The
scoring includes two distinct dimensions: a problem-
solving score for how well the solution solves the case
problem, and a feasibility score for how feasible the so-
lution is to the case problem. The two research assistants
independently code all of the 97 groups by identifying
the best match between the participants final decision
and the solution listed in the manual. Any disagreement
between the two coders is reconciled by the third coder.
In this work we use the binarized feasibility score as the
class label indicating group performance. We define class 1 as
high performing groups with a score greater or equal to 50, and
class 0 as low performing groups with a score less than 50. Fig-
ure 2 depicts the database distribution of the feasibility score.
2.2. Personality Composite-Network (P-CompN)
Figure 1 shows our Personality Composite-Network (P-
CompN) architecture. We model only the two actual partici-
pants within the session ignoring the sleeper cell’s behaviors
due to his/her consistent non-engaging behaviors in this group
performance prediction task. Specifically, our proposed P-
CompN architecture is composed of a fusion between two sub-
networks, Interlocutor Attention Network (IAN) and Personal-
ity Network (PN). We will first describe the two different fea-
tures inputs to P-CompN and then the details of our framework.
2.2.1. Feature Inputs: Acoustics and Personality Attributes
The audio signals are first segmented into speaker utterances
automatically. We extract the extended Geneva minimalistic
acoustic parameter set (eGeMAPS) for each utterance [19] as
acoustic inputs. eGeMAPs computes 88 dimensional features
1677
Table 1: Model performances using the metric of unweighted average recall (UAR) for high and low performing groups. The overall
result shows that the P-CompN outperforms all other methods in group performance classification task achieving 70.0% UAR.
Individual Models Group Models
Model
(talkative/less) Overall Low High Models Overall Low High
Model 0 55.6/54.4 47.2/41.7 64.0/67.2 Model 3 58.1 55.6 60.7
Model 1 55.6/53.2 47.2/50.0 63.9/56.5 Model 4 60.1 61.1 59.0
Model 2 58.5/59.1 69.4/44.4 47.5/73.8 IAN 63.1 63.9 62.3
PN 63.6 58.3 68.5
P-CompN 70.0 77.8 62.3
including statistical properties of mel-frequency cepstral coeffi-
cients (MFCCs), associated delta, and prosodic information.
In terms of personality attributes, since each of the inter-
locutors has different traits, in order to measure personality
composition characteristics within the group, an intuitive man-
ner is to compute statistics. Specifically, each member has 5
personality scores, and we compute the maximum, minimum,
mean and difference value of the group members to derive a
20-dimensional features as inputs of personality attributes.
2.2.2. Interlocutor Acoustic Network (IAN)
The core of IAN uses BLSTM with an attention mechanism
trained on the acoustic inputs. Each utterance is a time step
t. For each session, we first assign the interlocutors as either a
talkative or a talk-less subject; the talkative subject is the per-
son that speaks the most and often takes the leading role in the
interaction, and talk-less subject tends to look quieter and much
more tolerant to the existence of an assertive person. We train
a typical BLSTM for each subject with attention weight, αt,
defined as:
αt=exp[(uTyt)]
Ptexp(uTyt)(1)
where ytis the hidden layer of time step t.
In this work, we design a novel personality control mecha-
nism that integrates the effect of group personality composition
into the attention weight. Specifically, we take the 20 dimen-
sional personality composite features multiplies by a learnable
weight matrix W20×τto derive the personality control weight
for the i-th sample as below:
ctrli×t=Pi×20 ×W20×t(2)
where Pi×20 indicates the group composite personality inputs
mentioned in section 2.2.1, Wis normalized for summing to 1
by softmax. We can then reweight the original attention weight:
α0
t=αt+ctrlt(3)
With this personality reweighted attention mechanism, we
further derive the representation of the IAN, zIAN, by concate-
nating the BLSTM hidden layer output from both the talkative
and the talk-less subject :
zIAN = [z00
talkative z00
talk-less](4)
z00
{talkative,talk-less}=Gα0
t×y{talkative,talk-less},t(5)
where Gindicates a functional pooling layer over time, i.e.,
computing the maximum, minimum, mean, median, standard
deviation of the hidden layer output for the BLSTM. After ob-
taining zIAN, we feed it into the prediction layer consists of five
fully-connected (DNN) layers to perform binary classification.
2.2.3. Personality Network (PN)
Another sub-network is the personality network (PN). PN is
based on a 8-layer DNN that takes the input of 20 dimensional
group composite personality features to predict the group per-
formance directly.
The final prediction using our P-CompN in the binary group
performances is based on the average softmax output probabil-
ity of IAN and PN.
3. Experiment Setup and Results
3.1. Experiment Setup
In this section we briefly describe different comparison meth-
ods, model parameters, and our evaluation scheme.
3.1.1. Model Comparison
Model 0-Baseline
Using a standard talkative-only or talk-less-only subject’s
BLSTM with attention (without DNN layers) to perform
recognition directly.
Model 1-Individual Personality Network
Using a 8-layer DNN to model talkative or talk-less subject’s
five personality attributes only (not the composite statistic
measures) to perform recognition directly.
Model 2-BLSTM + Individual Personality Network
Combining Model 0 and Model 1 using decision-level fusion
by averaging the output probability to perform recognition.
Model 3-Dual-BLSTM
Concatenating output of each interlocutor’s BLSTM using a
summation pooling layer (not the functional pooling layer)
and feeding it to a five-layer DNN to perform recognition.
Model 4-Dual-BLSTM + Personality Control
Integrating Model 3 with the personality control mechanism
to the BLSTM attention weight to perform recognition.
Interlocutor Acoustic Network (IAN)
Using the method detailed in section 2.2.2, which modifies
Model 4 by replacing summation layer with the functional
pooling layer to perform recognition.
Personality Network (PN)
Using the method in section 2.2.3, which uses 5-layer DNN
on personality composite features to perform recognition.
Personality Composite-Network (P-CompN)
Using our proposed architecture to perform recognition.
3.1.2. Other Experimental Parameters
We pad sentences to equal length before training (224/147 time-
steps for talkative/talk-less respectively), then each BLSTM is
trained with a fixed length step. The number of hidden nodes in
the BLSTM is 64. IAN has 5 fully-connected layers with node
1678
size of: 1280, 640, 256, 256, 128, 2. PN has 8 fully-connected
layers with node size of: 20, 64, 64, 64, 32, 32, 32, 16, 2. We
use ReLU as activation function, drop out layer for first and last
layers, and batch normalization is also applied. Batch size is
set at 16, learning rate is set at 0.0005 using ADAM optimizer.
Cross-entropy is our optimized loss function, and we train our
network using 40 epochs. The experiment is carried out using
5-folds cross validation using the metric of unweighted average
recall (UAR). We adjusted to make the distribution of 5 folds
data consistent and reduce the bias.
3.2. Results and Analyses
3.2.1. Analysis on Model Performance
Table 1 summarized our complete prediction results. Our pro-
posed P-CompN obtains the best overall UAR (70.0%), which is
15% higher than baseline Model 0. Model 0 and Model 1 mod-
els acoustic behavior and personality attribute using individual
participant only (talkative-only or talk-less-only). The accu-
racy obtained with these two models are only around 55%, and
by using complementary information from individual model of
acoustic behaviors and personality attribute, i.e., Model 3, it
increases slightly to around 59%. We observe that by simply
modeling a single participant within a small group collabora-
tive task is not sufficient to obtain a sufficient predictive power
of the group performance. Generally, by comparing model ac-
curacy obtained in the ‘Group Models’, i.e., modeling both par-
ticipants, is better than the ‘Individual Models’.
Furthermore, Model 3 and Model 4 differs by whether the
participants acoustic BLSTMs have attention mechanism being
modulated by a personality composite control weight. Model
4 improves about 2% over Model 3 in predicting group perfor-
mance, which indicates that indeed the group personality infor-
mation affects jointly the behavior manifestation when complet-
ing this collaborative task. IAN replaces the conventional sum-
mation part of BLSTM attention mechanism with a functional
pooling layer, this method computes statistical properties on
the time-series output of the BLSTM weighted by personality-
controlled attention mechanism. The functional pooling pro-
vides another 3% improvement indicating the need of a more
complex temporal dynamics characterization of the participants
acoustic behaviors, which are shown to be beneficial in this
group performance recognition task.
Finally, we also note an interesting observation that PN
by itself achieves 63.6% UAR in group performance predic-
tion task. Our experiments demonstrate that the group mem-
ber’s personality configuration carry significant information on
the team performance, which corroborates with past literature
in group studies [2, 3]. In summary, our P-CompN architecture
that fuses the prediction output of IAN (63.1%) and PN (63.6%)
to obtain the best performing model of 70% UAR.
3.2.2. Analysis of Personality and Group Performance
Our experiments demonstrate that personality composite fea-
tures computed within a group can be used to predict team per-
Table 2: The bolded number indicates a statistically significant
correlation between group performance and each of the Big-5
personality composite attribute.
Big-5 Max Min Mean Difference
Extraversion -0.06 -0.04 -0.06 0.07
Agreeableness 0.17 0.06 0.16 0.09
Conscientiousness -0.01 0.01 -0.01 0.01
Neuroticism -0.01 0.20 0.08 -0.14
Openness 0.10 -0.05 0.01 0.05
formance classification in this school policy collaborative task.
To understand the influence of group personality on team per-
formance, we compute spearman correlation between each of
the 20 dimensions of composite group personality measures and
our target group performance label.
Table 2 includes the correlation results. The number in bold
indicates significant correlation at α= 0.05 level. We observe
that the maximum, the average of Agreeableness and the min-
imum of Neuroticism are positively correlated with group per-
formance. Previous study has also shown that Agreeableness
is one of the most important personality traits for team perfor-
mance due to its emphasis on cooperation and facilitation, if the
group members could treat others more friendly (maxAgree ),
show patience (minNeur ), and keep a collaborative atmosphere
(meanAgree ), it could facilitate a more engaging and comfort-
able interaction and help finish the task collaboratively with
quality [20, 21, 22, 23].
3.2.3. Analysis of Attention Weights
In section 3.2.1, we demonstrate that personality controlled
reweighting of the BLSTM attention network help improve the
overall prediction accuracy. We would like to further ana-
lyze this modified attention weights, which help re-emphasize
the important interlocutors behavior regions in the session, as
a function of the group performances. Specifically, we com-
pute the ratio within each session that these modified attention
weights have positive values, and compare the ratios between
the high performing groups versus the low performing groups
using t-test (α= 0.05). We find that the high performing group
sessions have a larger percentage of weights being positive than
the low performing group (p= 0.015). Our personality con-
trolled weights operates by shifting up and down the original
attention weights. Personality mechanism tends to add more
weights to the high performing group’s behavior segments. This
results seems to be intuitive that for those groups that have the
right composition of personality configuration would behave
more collaboratively, e.g., willing to communicate more, share
more ideas, and be more engaging. This is evident in a having
larger attention weights placed on their behaviors.
4. Conclusion and Future Work
Personality attribute is not only related to individual behavior
pattern during interaction, the personality composition within
the group also affects the overall team performance, especially
in small group collaborative task solving interactions. In this
work, we propose a novel Personality Composite-Network (P-
CompN), which includes a personality network (PN) and an in-
terlocutor acoustic network (IAN) that jointly integrate the ef-
fect of group members personality attributes into the attention
mechanism. We evaluate our P-CompN on a large three-person
interaction of school policy task and achieve a promising 70%
accuracy in predicting the group performance. Our analyses re-
veal several important personality attribute configurations to the
group performance and demonstrate the effect of higher empha-
sis on behaviors for groups with higher collaborative effort. We
will continue to advance our technical framework by including
other non-verbal modalities (e.g., facial expressions and ges-
tures), linguistic contents, and conversation flow (e.g., question
answering patterns). Furthermore, by continuously collaborat-
ing with group scholars, we would like to investigate the com-
plex interaction effect between the behaviors expressed and the
personality traits at the group-level and bring insights about the
specific interaction strategy that can help better achieve effec-
tive communication within a group discussion.
1679
5. References
[1] S. T. Bell, “Deep-level composition variables as predictors of team
performance: a meta-analysis.” Journal of applied psychology,
vol. 92, no. 3, p. 595, 2007.
[2] A. Kramer, D. P. Bhave, and T. D. Johnson, “Personality and
group performance: The importance of personality composition
and work tasks,” Personality and Individual Differences, vol. 58,
pp. 132–137, 2014.
[3] R. L. Moreland, J. Levine, and M. Wingert, “Creating the ideal
group: Composition effects at work,Understanding group be-
havior, vol. 2, pp. 11–35, 2013.
[4] B. H. Bradley, A. C. Klotz, B. E. Postlethwaite, and K. G. Brown,
“Ready to rumble: How team personality composition and task
conflict interact to improve performance.Journal of Applied Psy-
chology, vol. 98, no. 2, p. 385, 2013.
[5] S. Mohammed and L. C. Angell, “Personality heterogeneity in
teams: Which differences make a difference for team perfor-
mance?” Small group research, vol. 34, no. 6, pp. 651–677, 2003.
[6] K. A. Jehn and E. A. Mannix, “The dynamic nature of conflict: A
longitudinal study of intragroup conflict and group performance,”
Academy of management journal, vol. 44, no. 2, pp. 238–251,
2001.
[7] C. Beyan, V.-M. Katsageorgiou, and V. Murino, “A sequen-
tial data analysis approach to detect emergent leaders in small
groups,” IEEE Transactions on Multimedia, 2019.
[8] P. Dhani and T. Sharma, “Emotional intelligence and personality
traits as predictors of job performance of it employees,” Inter-
national Journal of Human Capital and Information Technology
Professionals (IJHCITP), vol. 9, no. 3, pp. 70–83, 2018.
[9] N. Attia, Big Five personality factors and individual performance.
Universit´
e du Qu´
ebec `
a Chicoutimi, 2013.
[10] S. Okada, L. S. Nguyen, O. Aran, and D. Gatica-Perez, “Model-
ing dyadic and group impressions with intermodal and interperson
features,” ACM Transactions on Multimedia Computing, Commu-
nications, and Applications (TOMM), vol. 15, no. 1s, p. 13, 2019.
[11] S. Fang, C. Achard, and S. Dubuisson, “Personality classification
and behaviour interpretation: An approach based on feature cate-
gories,” in Proceedings of the 18th ACM International Conference
on Multimodal Interaction. ACM, 2016, pp. 225–232.
[12] L. Batrinca, N. Mana, B. Lepri, N. Sebe, and F. Pianesi, “Mul-
timodal personality recognition in collaborative goal-oriented
tasks,” IEEE Transactions on Multimedia, vol. 18, no. 4, pp. 659–
673, 2016.
[13] Y.-S. Lin and C.-C. Lee, “Using interlocutor-modulated attention
blstm to predict personality traits in small group interaction,” in
Proceedings of the 2018 on International Conference on Multi-
modal Interaction. ACM, 2018, pp. 163–169.
[14] G. Murray and C. Oertel, “Predicting group performance in task-
based interaction,” in Proceedings of the 2018 on International
Conference on Multimodal Interaction. ACM, 2018, pp. 14–20.
[15] U. Avci and O. Aran, “Predicting the performance in decision-
making tasks: From individual cues to group interaction,” IEEE
Transactions on Multimedia, vol. 18, no. 4, pp. 643–658, 2016.
[16] J. E. Driskell, R. Hogan, and E. Salas, Personality and group per-
formance. Sage Publications, Inc, 1987.
[17] B. Wheeler and B. Mennecke, “The school of business policy task
manual,” 1992.
[18] L. R. Goldberg, “The development of markers for the big-five fac-
tor structure.” Psychological assessment, vol. 4, no. 1, p. 26, 1992.
[19] F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. Andr´
e,
C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan
et al., “The geneva minimalistic acoustic parameter set (gemaps)
for voice research and affective computing,IEEE Transactions
on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016.
[20] W. G. Graziano, E. C. Hair, and J. F. Finch, “Competitiveness
mediates the link between personality and group performance.”
Journal of Personality and Social Psychology, vol. 73, no. 6, p.
1394, 1997.
[21] G. A. Van Kleef, A. C. Homan, B. Beersma, and D. van Knip-
penberg, “On angry leaders and agreeable followers: How leaders
emotions and followers personalities shape motivation and team
performance,” Psychological Science, vol. 21, no. 12, pp. 1827–
1834, 2010.
[22] T. Sy, S. Cˆ
ot´
e, and R. Saavedra, “The contagious leader: impact
of the leader’s mood on the mood of group members, group affec-
tive tone, and group processes.” Journal of applied psychology,
vol. 90, no. 2, p. 295, 2005.
[23] R. E. De Vries, B. Van den Hooff, and J. A. De Ridder, “Ex-
plaining knowledge sharing: The role of team communication
styles, job satisfaction, and performance beliefs,” Communication
research, vol. 33, no. 2, pp. 115–135, 2006.
1680
... corresponding to these cues typically include statistics of signal energy and fundamental frequency (e.g., variation, maximum, mean), spectral features (formants, bandwidths, spectrum intensity), speaking rate (e.g., number of syllables per second) and local variability of the speech signal (e.g., jitter and shimmer) and Mel-Frequency Cepstral Coefficients (MFCCs). Due to the large number of speech features aimed at capturing vocal behavior, there have been attempts to identify standard features sets through the application of publicly available packages (e.g., OpenSMILE [52]) or meta-analysis of the literature (e.g., the Geneva Minimalistic Acoustic Parameter Set (GeMAPS) [212]). ...
... Deep Boltzmann Machines (DBM) were applied as an unsupervised feature learning method to jointly model body pose features in [21]. For sequential data processing, Long Short-Term Memory Networks (LSTM) [2,24,114,120,174,208,212], a type of Recurrent Neural Network (RNN), were applied for various problems and often in combination with CNNs. In [22], sequential data processing was performed with Conditional Restricted Boltzmann Machines (CRBM) [22] and a combination of RNNs with Restricted Boltzmann Machines (RNN-RBM) [22]. ...
... It has a high correlation with engagement, rapport, and even empathy. Through nonverbal behavior analysis, it is possible to detect whether there is a high/low group performance [212], to quantify interaction quality [105] or to predict group satisfaction level [106]. Below, we discuss each topic and the corresponding studies in depth and, Table 3 summarizes them. ...
Preprint
Full-text available
This work presents a systematic review of recent efforts (since 2010) aimed at automatic analysis of nonverbal cues displayed in face-to-face co-located human-human social interactions. The main reason for focusing on nonverbal cues is that these are the physical, machine detectable traces of social and psychological phenomena. Therefore, detecting and understanding nonverbal cues means, at least to a certain extent, to detect and understand social and psychological phenomena. The covered topics are categorized into three as: a) modeling social traits, such as leadership, dominance, personality traits, b) social role recognition and social relations detection and c) interaction dynamics analysis in terms of group cohesion, empathy, rapport and so forth. We target the co-located interactions, in which the interactants are always humans. The survey covers a wide spectrum of settings and scenarios, including free-standing interactions, meetings, indoor and outdoor social exchanges, dyadic conversations, and crowd dynamics. For each of them, the survey considers the three main elements of nonverbal cues analysis, namely data, sensing approaches and computational methodologies. The goal is to highlight the main advances of the last decade, to point out existing limitations, and to outline future directions.
... Recently, computational research has progressed in developing methods that automatically predict group-level task performance from verbal/non-verbal behaviors during small group interactions [6,7], and some research has started to investigate joint modeling approach in considering the intertwining effect between member's vocal behaviors and intra-group personality compositions [8,9]. Where these past research has laid the solid foundation in predicting group performances using vocal behaviors by jointly modeling the effect of intra-group personality composition, these works do not leverage the inter-group personality structures into consideration. ...
... For each session, we first rank and label participants according to their speak times from the most to the least, e.g., we assign the interlocutors as either talkative or talk-less subject in the NTULP database. We train a Bi-GRU for each subject with personality re-weighted attention mechanism as in our previous work [9], defined as: ...
... Bi-GRU+ATT-Vocal Behavior Only Training a typical Bi-GRU for each subject with attention to perform recognition directly. Personality Network (PN)-Vocal Personality Only Using the PN model in our previous work [9], which uses 5-layer DNN on personality composite features to perform recognition. ...
... Batch size is fixed as [16,32], the max epoch is 1000, and optimizer is ADAMAX [29]. Additionally, we follow [65,66], which are the closest studies to us, to use an unweighted average recall (UAR) as our final evaluation metric. Zhong et al. [65,66] modeled the group-level personality composition for group performance classification. ...
... Additionally, we follow [65,66], which are the closest studies to us, to use an unweighted average recall (UAR) as our final evaluation metric. Zhong et al. [65,66] modeled the group-level personality composition for group performance classification. Finally, the whole framework is implemented using the Pytorch toolkit [49]. ...
Conference Paper
Full-text available
Physiological synchrony is a particular phenomenon of physiological responses during a face-face conversation. However, while many previous studies had proposed various physiological synchrony measures between interlocutors in dyadic conversations, there are very few works on computing physiological synchrony in small groups (three or more people). Besides, belongingness and satisfaction are two important factors for the human to decide which group they want to stay. Therefore, in this preliminary work, we want to investigate and reveal the relationship between physiological synchrony and belongingness/satisfaction under group conversation. We feed the physiology of group members into a designed learnable graph structure with the group-level physiological synchrony and heart-related features computed from Photoplethysmography (PPG) signals. We then devise a Group-modulated Attentive Bi-directional Long Short-Term Memory (GGA-BLSTM) model to recognize three-levels of belongingness and satisfaction (low, middle, and high) in groups. Finally, we evaluate the proposed method on our recently collected multimodal group interaction corpus (never published before), NTUBA, and the results show that (1) the models trained jointly with the group-level physiological synchrony and the conventional heart-related features consistently outperforms the model only trained with the conventional features, and (2) the proposed model with a Graph-structure Group-modulated Attention mechanism (GGA), GGA-BLSTM, performs better than the strong baseline model, the attentive BLSTM. Finally, the GGA-BLSTM achieves a promising unweighted average recall (UAR) of 73.3% and 82.1% on group satisfaction and belongingness classification tasks respectively. In further analyses, we reveal the relationships between physiological synchrony and group satisfaction/belongingness.
... To be used in practice, the zscore normalization of the EPQ score function is computed in the training data, and then apply it on the test data. Furthermore, inspired by [49], [50] that computed statistics (e.g., mean, maximum, minimum) as measures of personalities for each interaction unit, e.g., within a group, we not only include raw EPQ scores but also compute seven statistics (difference, maximum, minimum, mean, standard deviation, lower quartile (quartile1), and upper quartile (quartile3)) between interrogator and deceiver (each pair participant). ...
Conference Paper
Full-text available
Deception occurs frequently in our life. It is well-known that people are generally not good at detecting deception, however, behaviors of interlocutors during an interrogator-deceiver conversation may indicate whether the interrogator thinks the other person is telling deceptions or not. The ability to automatically recognize such a perceived deception using behavior cues has the potential in advancing technologies for improved deception prevention or enhanced persuasion skills. To investigate the feasibility to recognize the perceived deception from behaviors, we utilize a joint learning framework by considering acoustic-prosodic features, linguistic characteristics, language uses, and conversational temporal dynamics. We further incorporate personality attributes as an additional input to the recognition network. Our proposed model is evaluated on a recently collected Chinese deceptive corpus of dialog games. We achieve an unweighted average recall (UAR) of 86.70% and 84.89% (UAR) on 2-class perceived deception-truth recognition tasks given the deceiver is telling either truths or lies, respectively. Further analyses reveal that 1) the deceiver's behaviors affect the interrogator's perception (e.g., the higher intensity of the deceiver makes the interrogator believe their statements even though they are deceptive in fact), 2) the interrogator's behavior features carry information about their own deception perception (e.g., interrogator's utterance duration is correlated to his/her perception of truth), and 3) personality traits indeed enhance perceived deception-truth recognition. Finally, we also demonstrate additional evidence indicating that human is bad at detecting deceptions-there are very few indicators that overlaps between perceived and produced truth-deceptive behaviors.
Article
A small group is a fundamental interaction unit for achieving a shared goal. Group performance can be automatically predicted using computational methods to analyze members’ verbal behavior in task-oriented interactions, as has been proven in several recent works. Most of the prior works focus on lower-level verbal behaviors, such as acoustics and turn-taking patterns, using either hand-crafted features or even advanced end-to-end methods. However, higher-level group-based communicative functions used between group members during conversations have not yet been considered. In this work, we propose a two-stage training framework that effectively integrates the communication function, as defined using Bales’ interaction process analysis (IPA) coding system, with the embedding learned from the low-level features in order to improve the group performance prediction. Our result shows a significant improvement compared to the state-of-the-art methods (4.241 MSE and 0.341 Pearson’s correlation on NTUBA-task1 and 3.794 MSE and 0.291 Pearson’s correlation on NTUBA-task2) on the NTUBA (National Taiwan University Business Administration) small-group interaction database. Furthermore, based on the design of IPA, our computational framework can provide a time-grained analysis of the group communication process and interpret the beneficial communicative behaviors for achieving better group performance.
Article
Full-text available
This paper addresses the problem of predicting emergent leaders (ELs) in small groups i.e. meetings. This is a long-lasting research problem for social and organizational psychology and a relevant problem that recently gained a momentum in social computing. Towards this goal, we propose a novel method, which analyzes the temporal dependencies of the audiovisual data by applying unsupervised deep learning generative models (feature learning). To the best of our knowledge, this is the first attempt that sequential data processing is performed for EL detection. Feature learning results in a single feature vector per a given time interval and all feature vectors representing a participant are aggregated using novel fusion techniques. Lastly, the emergent leader detection is performed using the state of the art single and multiple kernel learning algorithms. The proposed method shows (significantly) improved results as compared to the state of the art methods and it can be adapted to analyze various small group interactions given that it is a general approach.
Article
Full-text available
The chief aim of this article is to examine the emotional intelligence (EI) and personality traits as the predictors of job performance of IT employees in India. To obtain this, the data was collected from 158 middle management employees working in Indian IT sector through random sampling method with the help of three scales such as DKEIT, JPI, and MPI. After data collection, the study carried out a different statistical analysis which includes frequency, correlation and regression analysis through SPSS 23.0 version. The study findings reported that both EI and Personality Traits impact the performance of job of IT employees, i.e. both Personality Traits and EI operate as a predictor of Job Performance of Indian IT employees. Based on which, the article gives few recommendations to future researchers.
Conference Paper
Full-text available
This paper focuses on recognizing and understanding social dimensions (the personality traits and social impressions) during small group interactions. We extract a set of audio and visual features, which are divided into three categories: intra-personal features (i.e. related to only one participant), dyadic features (i.e. related to a pair of participants) and one vs all features (i.e. related to one participant versus the other members of the group). First, we predict the personality traits (PT) and social impressions (SI) by using these three feature categories. Then, we analyse the interplay be- tween groups of features and the personality traits/social impressions of the interacting participants. The prediction is done by using Support Vector Machine and Ridge Regression which allows to determine the most dominant features for each social dimension. Our experiments show that the combination of intra-personal and one vs all features can greatly improve the prediction accuracy of personality traits and social impressions. Prediction accuracy reaches 81.37% for the social impression named ’Rank of Dominance’. Finally, we draw some interesting conclusions about the relationship between personality traits/social impressions and social features.
Article
Full-text available
Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.
Article
Full-text available
We examine whether group members’ Big Five personality composition (variability, minimum, and maximum) affects the group’s performance. We employed an experimental design where participants were paid based on their performance in two different group-based experimental tasks: an additive task (where group performance is based on the sum of efforts of all group members) and a conjunctive task (where group performance is based on the performance of the weakest group member). Results indicate that variability in extraversion is positively related to group performance on the additive task but not on the conjunctive task. Conversely, neuroticism maximum score is negatively related to group performance on the conjunctive task but not on the additive task.
Article
This article proposes a novel feature-extraction framework for inferring impression personality traits, emergent leadership skills, communicative competence, and hiring decisions. The proposed framework extracts multimodal features, describing each participant’s nonverbal activities. It captures intermodal and interperson relationships in interactions and captures how the target interactor generates nonverbal behavior when other interactors also generate nonverbal behavior. The intermodal and interperson patterns are identified as frequent co-occurring events based on clustering from multimodal sequences. The proposed framework is applied to the SONVB corpus, which is an audiovisual dataset collected from dyadic job interviews, and the ELEA audiovisual data corpus, which is a dataset collected from group meetings. We evaluate the framework on a binary classification task involving 15 impression variables from the two data corpora. The experimental results show that the model trained with co-occurrence features is more accurate than previous models for 14 out of 15 traits.
Conference Paper
Small group interaction occurs often in workplace and education settings. Its dynamic progression is an essential factor in dictating the final group performance outcomes. The personality of each individual within the group is reflected in his/her interpersonal behaviors with other members of the group as they engage in these task-oriented interactions. In this work, we propose an interlocutor-modulated attention BSLTM (IM-aBLSTM) architecture that models an individual's vocal behaviors during small group interactions in order to automatically infer his/her personality traits. The interlocutor-modulated attention mechanism jointly optimize the relevant interpersonal vocal behaviors of other members of group during interactions. In specifics, we evaluate our proposed IM-aBLSTM in one of the largest small group interaction database, the ELEA corpus. Our framework achieves a promising unweighted recall accuracy of 87.9% in ten different binary personality trait prediction tasks, which outperforms the best results previously reported on the same database by 10.4% absolute. Finally, by analyzing the interpersonal vocal behaviors in the region of high attention weights, we observe several distinct intra- and inter-personal vocal behavior patterns that vary as a function of personality traits.
Conference Paper
We address the problem of automatically predicting group performance on a task, using multimodal features derived from the group conversation. These include acoustic features extracted from the speech signal, and linguistic features derived from the conversation transcripts. Because much work on social signal processing has focused on nonverbal features such as voice prosody and gestures, we explicitly investigate whether features of linguistic content are useful for predicting group performance. The conclusion is that the best-performing models utilize both linguistic and acoustic features, and that linguistic features alone can also yield good performance on this task. Because there is a relatively small amount of task data available, we present experimental approaches using domain adaptation and a simple data augmentation method, both of which yield drastic improvements in predictive performance, compared with a target-only model.
Article
Incorporating research on personality recognition into computers, both from a cognitive as well as an engineering perspective, would facilitate the interactions between humans and machines. Previous attempts on personality recognition have focused on a variety of different corpora (ranging from text to audiovisual data), scenarios (interviews, meetings), channels of communication (audio, video, text), and different subsets of personality traits (out of the five ones fromthe Big FiveModel). Our study uses simple acoustic and visual nonverbal features extracted from multimodal data, which have been recorded in previously uninvestigated scenarios, and consider all five personality traits and not just a subset. First, we look at the human-machine interaction scenario, where we introduce the display of different "collaboration levels." Second, we look at the contribution of the human-human interaction (HHI) scenario on the emergence of personality traits. Investigating theHHIscenario creates a stronger basis for future human-agents interactions. Our goal is to study, from a computational approach, the emergence degree of the five personality traits in these two scenarios. The results demonstrate the relevance of each of the two scenarios when it comes to the degree of emergence of certain traits and the feasibility to automatically recognize personality under different conditions.
Article
This paper addresses the problem of predicting the performance of decision-making groups. Towards this goal, we evaluate the predictive power of group attributes and discussion dynamics by using automatically extracted features, such as group members' aural and visual cues, interaction between team members, and influence of each team member, as well as selfreported features such as personality-and perception-related cues, hierarchical structure of the group, and individual-and grouplevel task performances. We tackle the inference problem from two angles depending on the way that features are extracted: 1) a holistic approach based on the entire meeting, and 2) a sequential approach based on the thin slices of the meeting. In the former, key factors affecting the group performance are identified and the prediction is achieved by support vector machines. As for the latter, we compare and contrast the classification performance of an influence model-based novel classifier with that of hidden Markov model (HMM). Experimental results indicate that the group looking cues and the influence cues are major predictors of group performance and the influence model outperforms the HMM in almost all experimental conditions. We also show that combining classifiers covering unique aspects of data results in improvement in the classification performance.