Content uploaded by Maria Tomprou
Author content
All content in this area was uploaded by Maria Tomprou on Mar 19, 2021
Content may be subject to copyright.
RESEARCH ARTICLE
Speaking out of turn: How video conferencing
reduces vocal synchrony and collective
intelligence
Maria TomprouID
1
*, Young Ji Kim
2
, Prerna Chikersal
3
, Anita Williams Woolley
1
, Laura
A. Dabbish
3
1Tepper School of Business, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of
America, 2Department of Communication, University of California, Santa Barbara, Santa Barbara, California,
United States of America, 3Human-Computer Interaction Institute, Carnegie Mellon University, Pittsburgh,
Pennsylvania, United States of America
*mtomprou@cmu.edu
Abstract
Collective intelligence (CI) is the ability of a group to solve a wide range of problems. Syn-
chrony in nonverbal cues is critically important to the development of CI; however, extant
findings are mostly based on studies conducted face-to-face. Given how much collaboration
takes place via the internet, does nonverbal synchrony still matter and can it be achieved
when collaborators are physically separated? Here, we hypothesize and test the effect of
nonverbal synchrony on CI that develops through visual and audio cues in physically-sepa-
rated teammates. We show that, contrary to popular belief, the presence of visual cues sur-
prisingly has no effect on CI; furthermore, teams without visual cues are more successful in
synchronizing their vocal cues and speaking turns, and when they do so, they have higher
CI. Our findings show that nonverbal synchrony is important in distributed collaboration and
call into question the necessity of video support.
Introduction
In order to survive, members of social species need to find ways to coordinate and collaborate
with each other [1]. Over a number of decades, scientists have come to study the collaboration
ability of collectives within a framework of collective intelligence, exploring the mechanisms
that enable groups to effectively collaborate to accomplish a wide variety of functions [2–6].
Recent research demonstrates that, like other species, human groups exhibit “collective
intelligence” (CI), defined as a group’s ability to solve a wide range of problems [2,3]. As
humans are a more cerebral species, researchers have thought that their group performance
depends largely on verbal communication and a high investment of time in interpersonal rela-
tionships that foster the development of trust and attachment [7,8]. However, more recent
research on collective intelligence in human groups illustrates that it forms rather quickly [2],
is partially dependent on members’ ability to pick up on subtle, nonverbal cues [9–11], and is
strongly associated with teams’ ability to engage in tacit coordination, or coordination without
PLOS ONE
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 1 / 14
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Tomprou M, Kim YJ, Chikersal P, Woolley
AW, Dabbish LA (2021) Speaking out of turn: How
video conferencing reduces vocal synchrony and
collective intelligence. PLoS ONE 16(3): e0247655.
https://doi.org/10.1371/journal.pone.0247655
Editor: Marcus Perlman, University of Birmingham,
UNITED KINGDOM
Received: August 5, 2020
Accepted: February 10, 2021
Published: March 18, 2021
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0247655
Copyright: ©2021 Tomprou et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: The data of the study
are publicly available at https://osf.io/tnv93/.
Funding: This material is based upon work
supported by the National Science Foundation
under grant numbers CNS-1205539 (url: https://
verbal communication [12]. This suggests that there is likely a so-called deep structure to CI in
human groups, with nonverbal and physiological underpinnings [12,13], just as is the case in
other social species [14,15].
Existing research suggests that nonverbal cues, and their synchronization, play an impor-
tant role in human collaboration and CI [10]. Nonverbal cues are those that encompass all the
messages other than words that people exchange in interactive contexts. Researchers consider
nonverbal cues more reliable than verbal cues in conveying emotion and relational messages
[16] and find that nonverbal cues are important for regulating the communication pace and
flow between interacting partners [17,18]. The literature on interpersonal coordination
explores many forms of synchrony [19,20], but the common view is that synchrony is
achieved when two or more nonverbal cues or behaviors are aligned [21,22]. Social psychology
researchers traditionally study synchrony in terms of body movements, such as leg movements
[23], body posture sway [24,25], finger tapping [26] and dancing [27]. These forms of syn-
chrony contribute to interpersonal liking, cohesion, and coordination in relatively simple tasks
[28,29]. Synchrony in facial muscle activity [30] and prosodic cues such as vocal pitch and
voice quality [31–33] are of particular importance for the coordination of interacting group
members, as these facilitate both communication and interpersonal closeness. For example,
synchrony in facial cues has been consistently found to indicate partners’ liking for each other
and cohesion [30].
While humans in general tend to synchronize with others, interaction partners also vary in
the level of synchrony they achieve. The level of synchrony in a group can be influenced by the
qualities of existing relationships [34] but can also be influenced by the characteristics of indi-
vidual team members; for instance, individuals who are more prosocial [35] and more atten-
tive to social cues [10,36] are more likely to achieve synchrony and cooperation with
interaction partners. And, consistent with the link between synchrony and cooperation, recent
studies demonstrate that greater synchrony in teams is associated with better performance
[37,38].
Among the elements that nonverbal cues coordinate is spoken communication, particularly
conversational speaking turns, wherein partners regulate nonverbal cues to signal their inten-
tion to maintain or yield turns [39]. Conversational turn-taking has fairly primitive origins,
being observed in other species and emerging in infants prior to linguistic competence, and is
evident in different spoken languages around the world [40]. The equality with which interac-
tion partners speak varies, however, and those who do have more speaking equality consis-
tently exhibit higher collective intelligence [2,11]. The negative effect of speaking inequality
on collective intelligence has been demonstrated both in face-to-face and online interactions
[11].
The majority of existing studies on synchrony were conducted in face-to-face environments
[20,30,41] and focused on the relationship between synchrony and cohesion. We have a lim-
ited understanding of how synchrony relates to collective intelligence, particularly when group
members are not collocated and collaborate on an ad hoc basis -a form of modern organization
that has become increasingly common [42,43]. Given the exponential growth in the use of
technology to mediate human relationships [44,45], an important question is whether syn-
chrony in common, nonverbal communication cues in face-to-face interaction, such as facial
expression and tone of voice, still plays a role in human problem-solving and collaboration in
mediated contexts, and how the role of different cues changes based on the communication
medium used.
Researchers and managers alike assume that the closer a technology-mediated interaction is
to face-to-face interaction–by including the full range of nonverbal cues (e.g., visual, audio,
physical environment)–the better it will be at fostering high quality collaboration [46–48]. The
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 2 / 14
www.nsf.gov/awardsearch/showAward?AWD_ID=
1205539&HistoricalAwards=false) Author who
received the award: L.D., OAC-1322278 (url:https://
nsf.gov/awardsearch/showAward?AWD_ID=
1322278) (Author who received the award A.W.),
and OAC-1322254 (url:.https://nsf.gov/
awardsearch/showAward?AWD_ID=1322254)
(Author who received the award A.W.). The funders
had no role in study design, data collection and
analysis, decision to publish, or preparation of the
manuscript.
Competing interests: The authors have declared
that no competing interests exist.
idea that having more cues available helps collaborators bridge distance is strongly represented
in both the management literature [49,50] and lay theory [51]. However, some empirical
research suggests that visual cue availability may not always be superior to audio cues alone. In
the absence of visual cues, communicators can effectively compensate, seek social information,
and develop relationships in technology-mediated environments [52–55]. Indeed, in some
cases, task-performing groups find their partners more satisfactory and trustworthy in audio-
only settings than in audiovisual settings [56,57], suggesting that visual cues may serve as dis-
tractors in some conditions.
Purpose of the study and hypotheses
The primary goal of this research is to understand whether physically distributed collaborators
develop nonverbal synchrony, and how variation in audio-visual cue availability during collab-
oration affects nonverbal synchrony and collective intelligence. Specifically, we test whether
nonverbal synchrony–an implicit signal of coordination–is a mechanism regulating the effect
of communication technologies on collective intelligence. Previous research defines nonverbal
synchrony as any type of synchronous movement and vocalization that involves the matching
of actions in time with others [23]. This study focuses on two types of nonverbal synchrony
that are particularly relevant to the quality of communication and are available through virtual
collaboration and interaction–namely, facial expression and prosodic synchrony. We hypothe-
size that in environments where people have access to both visual and audio cues, collective
intelligence will develop through facial expression synchrony as a coordination mechanism.
When visual cues are absent, however, we anticipate that interacting partners will reach higher
levels of collective intelligence through prosodic synchrony. It will also be interesting to see if
facial expression synchrony develops and affects collective intelligence even in the absence of
visual cues; if this occurs, it would suggest that this type of synchrony forms, at least in part,
based on similarity in partners’ internal reactions to shared experiences, versus simply as reac-
tions to partner’s facial expressions. If facial expression synchrony is important for CI only
when partners see each other, it would suggest that the expressions play a predominantly social
communication role under those conditions, and the joint attention of partners to these signals
is an indicator of the quality of their communication. To explore these predictions, we con-
ducted an experiment where we utilized two different conditions of distributed collaboration,
one with no video access to collaboration partners (Condition 1) and one with video access
(Condition 2) to disentangle how the types of cues available affect the type of synchrony that
forms and its implications for collective intelligence.
Method
Participant recruitment and data collection
Our sample included 198 individuals (99 dyads; 49 in Condition 1 and 50 in Condition 2). We
recruited 292 individuals from a research participation pool of a northeastern university in the
United States and randomly assigned into 146 dyads (59 in condition 1 and 87 in condition 2).
Due to technical problems with audio recording, ten dyads had missing audio data in Condi-
tion 1 and 37 dyads in Condition 2 resulting in 62% valid responses. To test for possible bias
introduced by missing data, we conducted independent sample t-tests to assess any differences
in demographics between the dyads retained and those we excluded due to technical difficul-
ties; no differences were detected (see S1 Appendix). All signed an informed consent form.
The average age in the sample was 24.82 years old (SD = 7.18 years); Ninety-six participants
(48.7%) were female. The ethnic composition of our sample was racially diverse: 6.6% from
different races, 50% Asian or Pacific, 33% White or Caucasian, 7% Black or African American,
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 3 / 14
2.5% Latin or Hispanic. Carnegie Mellon University’s Institutional Review Board approved all
materials and procedures in our study. The participant in Fig 1 has provided a written
informed consent to publish their case details.
The procedure was the same in both conditions, except that in Condition 1 there was no
camera and participants could only hear each other through an audio connection. In Condi-
tion 2, participants could also see each other through a video connection. Both conditions had
approximately equal numbers of dyads in terms of gender composition (i.e., no female, one
female, only-female dyads). Each session lasted about 30 minutes. Members of each dyad were
seated in two separate rooms. After participants completed the pre-test survey independently,
they initiated a conference call with their partner. Participants logged onto the Platform for
Online Group Studies (POGS: pogs.mit.edu), a web browser-based platform supporting syn-
chronous multiplayer interaction, to complete the Test of Collective Intelligence (TCI) with
their partner [2,11]. The TCI contained six tasks ranging from 2 to 6 minutes each, and
instructions were displayed before each task for 15 seconds to 1.5 minutes. At the end of the
test, participants were instructed to sign off the conference call. Participants were then com-
pensated and debriefed. The publication has created a laboratory protocol with DOI.
Measures
Collective intelligence. Collective intelligence was measured using the Test of Collective
Intelligence (TCI) completed by dyads working together. The TCI is an online version of the
collective intelligence battery of tests used by [2], which contains a wide range of group tasks
[11,58]. The TCI was adapted into an online tool to allow researchers to administer the test in
a standardized way, even when participants are not collocated. Participants completed six
tasks representing a variety of group processes (e.g., generating, deciding, executing, remem-
bering) in a sequential order (see study’s protocol). To obtain collective intelligence scores for
all dyads, we first scored each of the six tasks and then standardized the raw task scores. We
then computed an unweighted mean of the six standardized scores, a method adapted from
Fig 1. This flowchart illustrates the methodology used to transform the raw data of each participant into individual
signals or measures from which synchrony and spoken communication features are calculated.
https://doi.org/10.1371/journal.pone.0247655.g001
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 4 / 14
prior research on collective intelligence [58]. Cronbach’s alpha for the reliability of the TCI
scores was .81.
Facial expressions. We used OpenFace [59] to automatically detect facial movements in
each frame, based on the Facial Action Coding System (FACS). We categorized these facial
movements as positive (AU12 i.e., lip corner puller with and without AU6 i.e., cheek raiser),
negative (AU15 lip i.e., corner depressor and AU1 i.e., inner brow raiser and/or AU4 i.e., brow
lowerer) or other expressions (i.e., everything else in low occurrence that may be random).
Facial expression synchrony of the dyad is a variable encoding the synchrony between the
coded facial expression signals of the partners.
Prosodic features. Prosodic characteristics of speech contribute to linguistic functions
such as intonation, tone, stress, and rhythm. We used OpenSMILE [60] to extract 16 prosodic
features over time from the audio recording of each participant. These features included pitch,
loudness, and voice quality, as well as the frame-to-frame differences (deltas) between them.
We conducted principal components analysis with varimax rotation and used the first factor
extracted, which accounted for 55.87% of the variance in the data. The first factor included
four prosodic features: pitch, jitter, shimmer, and harmonics-to-noise ratio. Pitch is the funda-
mental frequency (or F0); jitter, shimmer, and harmonics-to-noise ratio are the three features
that index voice quality [61]. Jitter describes pitch variation in voice, which is perceived as
sound roughness. Shimmer describes the fluctuation of loudness in the voice. Harmonics-to-
noise ratio captures perceived hoarseness. Previous research has also identified these features
as important in predicting quality in social interactions [62]. All features were normalized
using z-scores to account for individual differences in range. Speaker diarization was not
needed, as the speech of each participant was recorded in separate files.
Nonverbal synchrony. Fig 1 illustrates how the raw data of each participant was trans-
formed to derive individual signals or measures. These individual signals or measures were then
used to calculate dyadic synchrony in facial expressions and prosodic features, speaking turn
inequality, and amount of overall communication. We computed synchrony in facial expres-
sions (coded as positive, negative, and other in each frame) and prosodic features between part-
ners for each dyad, using Dynamic Time Warping (DTW). DTW takes two signals and warps
them in a nonlinear manner to match them with each other and adjust to different speeds. It
then returns the distance between the warped signals. The lower this distance, the higher the
synchrony between members of the dyad. Hence, we reversed the signs of the DTW distance
measure to facilitate its interpretation as a measure of synchrony. We use DTW instead of other
distance metrics such as the Pearson correlation or simple Euclidean distance because DTW is
able to match similar behaviors of different duration that occur a few seconds apart, which better
captures the responsive, social nature of these expressions (see comparison in Fig 2) For both
facial expressions and prosodic features, we calculated synchrony across the six tasks of the TCI.
Spoken communication. We computed two features of spoken communication: speaking
turn inequality and the amount of overall spoken communication in the dyad. In order to
compute features related to the number of speaking turns, we first identified speaking turns in
audio recordings of each dyad. All audio frames for which Covarep [63] returned a voicing
probability over .80 were considered to contain speech. We extracted turns using the following
process [64]. First, only one person can hold a turn at a given time. Each turn passes from per-
son A to person B if person A stops speaking before person B starts. If person B interrupts per-
son A, then the turn only passes from A to B if A stops speaking before B stops. If person A
pauses for longer than one second, A’s turn ends. When both participants are silent for greater
than one second, no one holds the turn. We heuristically chose the threshold of one second,
since the pauses between most words in English are less than one second [64]. To measure
speaking turn inequality, we computed the absolute difference between the total number of
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 5 / 14
turns of both partners in the dyad. To measure the amount of overall spoken communication,
we summed the total number of samples of speech (i.e., the amount of time each person spoke
with voicing probability >.80) of both partners in the dyad.
Social perceptiveness. At the beginning of the session, each participant completed the
Reading the Mind in the Eyes (RME) test to assess the participant’s social perceptiveness [65].
This characteristic gauges individuals’ ability to draw inferences about how others think or feel
based on subtle nonverbal cues. Previous research has shown that social perceptiveness
enhances interpersonal coordination [66] and collective intelligence [2,11]. The test consists
of 36 images of the eye region of individual faces. Participants were asked to choose among
possible mental states to describe what the person pictured was feeling or thinking. The
options were complex mental states (e.g., guilt) rather than simple emotions (e.g., anger). Indi-
vidual participants’ scores were averaged for each dyad. We controlled for social perceptiveness
in our analyses predicting CI, because it is a consistent predictor of collective intelligence in
prior work.
Demographics. We also collected demographic attributes such as race, age, education,
and gender for each participant. As our level of analysis was the dyad, we calculated race simi-
larity, age and education distance, and number of females in the dyad.
Results
Table 1 provides bi-variate correlations among study variables and descriptive statistics. We
first examined whether collective intelligence differs as a function of video availability. An
Fig 2. Dynamic Time Warping (DTW) is a better measure of behavioral synchrony than Euclidean distance
because it is able to match similar behaviors of different duration that occur a few seconds apart.
https://doi.org/10.1371/journal.pone.0247655.g002
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 6 / 14
independent samples t-test comparing our two experimental conditions (no video vs. video)
revealed that there was not a significant difference in the observed level of collective intelli-
gence (M
Video
= -.07, SD
Video
= .64; M
NoVideo
= .08, SD
NoVideo
= .53; t(97) = -1.23, p= .22). Fur-
ther, and surprisingly, the level of synchrony in facial expressions was also not significantly
different between the two conditions; dyads with access to video did not synchronize facial
expressions more than dyads without access to video (M
Video
= -7614.80, SD
Video
= 3472.92;
M
NoVideo
= -7248.58, SD
NoVideo
= 3167.11;t(97) = -.55, p= .56). By contrast, the difference in
prosodic synchrony between the two conditions was significant; prosodic synchrony was sig-
nificantly higher in dyads without access to video (M
Video
= -.32, SD
Video
= 1.18; M
NoVideo
=
.26, SD
NoVideo
= .72; t(97) = -2.95, p= .004).
Finally, partners’ number of speaking turns were significantly less equally distributed in
dyads with video than in dyads with no video (speaking turn inequality M
Video
= 26.31, SD
Video
= 22.96; M
NoVideo
= 9.14, SD
NoVideo
= 5.63; t(97) = 5.13, p= .000).
We further examined whether synchrony affects CI differently depending on the availability
of video. Though collective intelligence did not differ with access to video, nor did the level of
facial expression synchrony achieved, we found that synchrony in facial expressions positively
predicted collective intelligence only in the video condition (see Fig 3; the unstandardised coef-
ficient for the conditional effect = .0001, t= 2.70, p= .01, bias-corrected bootstrap confidence
intervals were between.0000 and.0001, suggesting that when video was available, facial expres-
sions play more of a social role and partners jointly attend to them. Furthermore, social percep-
tiveness significantly predicted facial expression synchrony in the video condition (r= .31, p=
.03), consistent with previous research [10], but not in the no video condition (r= -.17, p= .25).
In addition, in the sample overall we found a main effect of prosodic synchrony on CI; con-
trolling for covariates, prosodic synchrony significantly and positively predicted CI (b= .29,
p= .003). We wondered why prosodic synchrony was higher in the no video condition, so
we explored other qualities of the dyads’ speaking patterns, particularly the distribution in
Table 1. Correlation matrix for study variables and descriptive statistics.
1 2 3 4 5 6 7 8 9 10 11
1. Collective intelligence
2. Facial expression synchrony .16
3. Prosodic synchrony .29�� .02
4. Speaking turn inequality -.13 .10 -.35��
5. Overall spoken communication -.24�-.05 -.10 -.11
6. Video condition -.12 -.05 -.28�� .46�� -.16
7. Social perceptiveness .33�� .08 .02 .03 .02 -.04
8. Female number .15 .04 .07 .00 -.09 .00 .20�
9. Age distance -.15 -.04 -.04 .16 -.06 .36�� -.18 -.12
10. Ethnic similarity -.02 -.09 .00 -.02 .08 .05 -.22�-.00 -.03
11. Education distance -.18 .10 -.19 .05 -.08 .05 -.19 -.00 .25�.09
Minimum -1.64 -27428 -3.26 0 214221 0 17.5 0 0 0 0
Maximum 1.35 -1617 1.63 82 16575414 1 32.5 2 49 4 4
Mean .00 -7789.28 0 17.47 6765098.17 - 26.25 .98 5.64 .36 1.25
SD .58 4206.59 1 18.44 3520702.91 - 2.78 .83 7.59 .48 1.14
Note:
�p<.05;
�� p<.01; N = 99 dyads.
https://doi.org/10.1371/journal.pone.0247655.t001
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 7 / 14
speaking turns which, as discussed earlier, is an aspect of communication shown to be an
important predictor of CI in prior studies [2,11]. Speaking turn inequality negatively pre-
dicted prosodic synchrony, controlling for covariates (b= -.35, p= .001). Mediation analyses
showed that speaking turn inequality mediated the relationship between video condition and
prosodic synchrony (effect size = .26, and the bias-corrected bootstrap confidence intervals are
between.05 and.44). To test the causal pathway from video access to speaking turn inequality
to prosodic synchrony to collective intelligence, we formally tested a serial mediation model.
The serial mediation was significant (effect size = .05, and the bias-corrected bootstrap confi-
dence intervals are between -.09 and -.018 (see Fig 4).
That is, video access leads to greater speaking turn inequality and, in turn, decreases the
dyad’s prosodic synchrony, which then decreases the dyad’s collective intelligence (see also
Table 2). Note here that an analysis of reverse causality, predicting the speaking turn inequality
from prosodic synchrony, was not supported as an alternative explanation.
Fig 3. Interaction effects of facial expression synchrony and video access condition on collective intelligence.
https://doi.org/10.1371/journal.pone.0247655.g003
Fig 4. Serial mediation analysis of the effect of video access on collective intelligence.
https://doi.org/10.1371/journal.pone.0247655.g004
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 8 / 14
Discussion
We explored what role, if any, video access to partners plays in facilitating collaboration when
partners are not collocated. Though we found no direct effects of video access on collective
intelligence or facial expression synchrony, we did find that in the video condition, facial
expression synchrony predicts collective intelligence. This result suggests that when visual
cues are available it is important that interaction partners attend to them. Furthermore, when
video was available, social perceptiveness predicted facial synchrony, reinforcing the role this
individual characteristic plays in heightening attention to available cues. We also found that
prosodic synchrony improves collective intelligence in physically separated collaborators
whether or not they had access to video. An important precursor to prosodic synchrony is the
equality in speaking turns that emerges among collaborators, which enhances prosodic syn-
chrony and, in turn, collective intelligence. Surprisingly, our findings suggest that video access
may, in fact, impede the development of prosodic synchrony by creating greater speaking turn
inequality, countering some prevailing assumptions about the importance of richer media to
facilitate distributed collaboration.
Our findings build on existing research demonstrating that synchrony improves coordina-
tion [30,33] by showing that it also improves cognitive aspects of a group, such as joint
Table 2. Summary of regression analyses for serial mediation.
Dependent Variable: Speaking turn inequality coefficient se t p 95% Confidence Intervals
Lower Bound Upper Bound
constant -.88 .91 -.97 .33 -2.69 .92
Social perceptiveness .02 .03 .59 .55 -.04 .08
Female number -.01 .11 -.14 .88 -.24 .20
Overall spoken communication -.03 .09 -.38 .69 -.22 .15
Video condition .92 .18 4.95 .00 .55 1.29
R
2
= .21, F(4,94) = 6.53, p = .001
Dependent Variable: Prosodic synchrony coefficient se t p 95% Confidence Intervals
Lower Bound Upper Bound
constant -.79 .94 -.83 .40 -2.67 1.08
Social perceptiveness .00 .03 .16 .87 -.06 .07
Female number .06 .11 .54 .58 -.16 .29
Overall spoken communication -.16 .09 -1.67 .09 -.35 .03
Video condition -.36 .21 -1.70 .09 -.79 .06
Speaking turn inequality -.28 .10 -2.63 .00 -.49 -.07
R
2
= .17, F(5,93) = 3.85, p = .003
Dependent Variable: Collective intelligence coefficient se t p 95% Confidence Intervals
Lower Bound Upper Bound
constant -1.90 .52 -3.63 .00 -2.95 -8.64
Social perceptiveness .06 .01 3.51 .00 .02 .10
Female number .02 .06 .45 .64 -.09 .15
Overall spoken communication -.14 .05 -2.58 .01 -.25 -.03
Video condition -.06 .12 -.54 .58 -.30 .17
Speaking turn inequality -.03 .06 -.63 .52 -.16 .08
prosodic synchrony .12 .05 2.23 .02 .01 .24
R
2
= .25, F(6,92) = 5.23, p =.001
Note. N = 99 dyads; Video condition coded as 1, No video condition coded as 0.
https://doi.org/10.1371/journal.pone.0247655.t002
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 9 / 14
problem-solving and collective intelligence in distributed collaboration. Much of the previous
research on synchrony has been conducted in face-to-face settings. We offer evidence that
nonverbal synchrony can occur and is important to the level of collective intelligence in dis-
tributed collaboration. Furthermore, we demonstrate different pathways through which differ-
ent types of cues can affect nonverbal synchrony and, in turn, collective intelligence. For
example, prosodic synchrony and speaking turn equality seem to be important means for reg-
ulating collaboration. Speaking turns are a key communication mechanism operating in social
interaction by regulating the pace at which communication proceeds, and is governed by a set
of interaction rules such as yielding, requesting, or maintaining turns [18]. These rules are
often subtly communicated through nonverbal cues such as eye contact and vocal cues (e.g.,
back channels), altering volume and rate [18]. However, our findings suggest that visual non-
verbal cues may also enable some interacting partners to dominate the conversation. By con-
trast, we show that when interacting partners have audio cues only, the lack of video does not
hinder them from communicating these rules but instead helps them to regulate their conver-
sation more smoothly by engaging in more equal exchange of turns and by establishing
improved prosodic synchrony. Previous research has focused largely on synchrony regulated
by visual cues, such as studies showing that synchrony in facial expressions improves cohesion
in collocated teams [30]. Our study underscores the importance of audio cues, which appear
to be compromised by video access.
Our findings offer several avenues for future research on nonverbal synchrony and human
collaboration. For instance, how can we enhance prosodic synchrony? Some research has
examined the role of interventions to enhance speaking turn equality for decision making
effectiveness [67]. Could regulating conversational behavior increase prosodic synchrony?
Furthermore, does nonverbal synchrony affect collective intelligence similarly in larger
groups? For example, as group size increases, a handful of team members tend to dominate the
conversation [68] with implications for spoken communication, nonverbal synchrony, and
ultimately collective intelligence. Our results also underscore the importance of using behav-
ioral measures to index the quality of collaboration to augment the dominant focus on self-
report measures of attitudes and processes in the social sciences, because collaborators may
not always report better collaborations despite exhibiting increased synchrony and collective
intelligence [2,10]. Our study has limitations, which offer opportunities for future research.
For example, our findings were observed in newly formed and non-recurring dyads in the lab-
oratory. It remains to be seen whether our findings will generalize to teams that are ongoing or
in which there is greater familiarity among members, as in the case of distributed teams in
organizations. We encourage future research to test these findings in the field within organiza-
tional teams.
Overall, our findings enhance our understanding of the nonverbal cues that people rely on
when collaborating with a distant partner via different communication media. As distributed
collaboration increases as a form of work (e.g., virtual teams, crowdsourcing), this study sug-
gests that collective intelligence will be a function of subtle cues and available modalities.
Extrapolating from our results, one can argue that limited access to video may promote better
communication and social interaction during collaborative problem solving, as there are fewer
stimuli to distract collaborators. Consequently, we may achieve greater problem solving if new
technologies offer fewer distractions and less visual stimuli.
Supporting information
S1 Appendix. t-test results comparing cases with valid and missing data.
(PDF)
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 10 / 14
Acknowledgments
We thank research assistants Thomas Rasmussen, Brian Hall, and Mikahla Vicino for their
help with data collection. We are also grateful to Ella Glickson and Rosalind Chow for provid-
ing valuable feedback in earlier versions of this manuscript.
Author Contributions
Conceptualization: Maria Tomprou, Young Ji Kim, Prerna Chikersal, Anita Williams Wool-
ley, Laura A. Dabbish.
Data curation: Prerna Chikersal.
Formal analysis: Maria Tomprou, Young Ji Kim.
Funding acquisition: Anita Williams Woolley, Laura A. Dabbish.
Investigation: Maria Tomprou, Prerna Chikersal.
Methodology: Maria Tomprou, Prerna Chikersal.
Project administration: Laura A. Dabbish.
Resources: Anita Williams Woolley, Laura A. Dabbish.
Software: Prerna Chikersal, Laura A. Dabbish.
Supervision: Anita Williams Woolley, Laura A. Dabbish.
Writing – original draft: Maria Tomprou, Young Ji Kim, Prerna Chikersal, Anita Williams
Woolley.
Writing – review & editing: Maria Tomprou, Young Ji Kim, Prerna Chikersal, Anita Williams
Woolley, Laura A. Dabbish.
References
1. Bear A, Rand DG. Intuition, deliberation, and the evolution of cooperation. Proceedings of the National
Academy of Sciences. 2016; 113(4):936–941. https://doi.org/10.1073/pnas.1517780113 PMID:
26755603
2. Woolley AW, Chabris CF, Pentland A, Hashmi N, Malone TW. Evidence for a collective intelligence fac-
tor in the performance of human groups. Science. 2010; 330(6004):686–688. https://doi.org/10.1126/
science.1193147 PMID: 20929725
3. Bernstein E, Shore J, Lazer D. How intermittent breaks in interaction improve collective intelligence.
Proceedings of the National Academy of Sciences. 2018; p.8734–8739. https://doi.org/10.1073/pnas.
1802407115 PMID: 30104371
4. Bonabeau E, Dorigo M, Theraulaz G. Inspiration for optimization from social insect behaviour. Nature.
2000; 406(6791): 39–42. https://doi.org/10.1038/35017500 PMID: 10894532
5. Hong L, Page SE. Groups of diverse problem solvers can outperform groups of high-ability problem
solvers. Proceedings of the National Academy of Sciences. 2004; 101(46):16385–16389. https://doi.
org/10.1073/pnas.0403723101 PMID: 15534225
6. Kittur A, Kraut RE. Harnessing the wisdom of crowds in wikipedia: quality through coordination. In: Pro-
ceedings of the 2008 ACM conference on Computer supported cooperative work. ACM; 2008. p. 37–
46.
7. Dirks KT. The effects of interpersonal trust on work group performance. Journal of Applied Psychology.
1999; 84(3):445. https://doi.org/10.1037/0021-9010.84.3.445 PMID: 10380424
8. Lindskold S. Trust development, the GRIT proposal, and the effects of conciliatory acts on conflict and
cooperation. Psychological Bulletin. 1978; 85(4):772. https://doi.org/10.1037/0033-2909.85.4.772
9. Carney DR, Harrigan JA. It takes one to know one: Interpersonal sensitivity is related to accurate
assessments of others’ interpersonal sensitivity. Emotion. 2003; 3(2):194–200. https://doi.org/10.1037/
1528-3542.3.2.194 PMID: 12899418
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 11 / 14
10. Chikersal P, Tomprou M, Kim YJ, Woolley AW, Dabbish L. Deep Structures of Collaboration: Physiolog-
ical Correlates of Collective Intelligence and Group Satisfaction. In: Proceedings of the 2017 ACM con-
ference on Computer supported cooperative work; 2017. p. 873–888.
11. Engel D, Woolley AW, Jing LX, Chabris CF, Malone TW. Reading the mind in the eyes or reading
between the lines? Theory of mind predicts collective intelligence equally well online and face-to-face.
PloS one. 2014; 9(12):e115212. https://doi.org/10.1371/journal.pone.0115212 PMID: 25514387
12. Aggarwal I, Woolley AW, Chabris CF, Malone TW. The impact of cognitive style diversity on implicit
learning in teams. Frontiers in Psychology. 2019; 10:112. https://doi.org/10.3389/fpsyg.2019.00112
PMID: 30792672
13. Akinola M, Page-Gould E, Mehta PH, Lu JG. Collective hormonal profiles predict group performance.
Proceedings of the National Academy of Sciences. 2016; 113(35):9774–9779. https://doi.org/10.1073/
pnas.1603443113 PMID: 27528679
14. Berdahl A, Torney CJ, Ioannou CC, Faria JJ, Couzin ID. Emergent sensing of complex environments by
mobile animal groups. Science. 2013; 339(6119):574–576. https://doi.org/10.1126/science.1225883
PMID: 23372013
15. Gordon DM. Collective wisdom of ants. Scientific American. 2016; 314(2):44–47. https://doi.org/10.
1038/scientificamerican0216-44 PMID: 26930827
16. Guerrero LK, DeVito JA, Hecht ML. The nonverbal communication reader: Classic and contemporary
readings. Waveland Press Prospect Heights, IL; 1999.
17. Duncan S. Some signals and rules for taking speaking turns in conversations. Journal of Personality
and Social Psychology. 1972; 23(2):283–292. https://doi.org/10.1037/h0033031
18. Knapp ML, Hall JA, Horgan TG. Nonverbal communication in human interaction. Cengage Learning;
2013.
19. Bernieri FJ, Davis JM, Rosenthal R, Knee CR. Interactional synchrony and rapport: Measuring syn-
chrony in displays devoid of sound and facial affect. Personality and Social Psychology Bulletin. 1994;
20(3):303–311. https://doi.org/10.1177/0146167294203008
20. Vacharkulksemsuk T, Fredrickson BL. Strangers in sync: Achieving embodied rapport through shared
movements. Journal of Experimental Social Psychology. 2012; 48(1):399–402. https://doi.org/10.1016/
j.jesp.2011.07.015 PMID: 22389521
21. Miles LK, Griffiths JL, Richardson MJ, Macrae CN. Too late to coordinate: Contextual influences on
behavioral synchrony. European Journal of Social Psychology. 2010; 40(1):52–60.
22. Konvalinka I, Xygalatas D, Bulbulia J, Schjødt U, JegindøEM, Wallot S, et al. Synchronized arousal
between performers and related spectators in a fire-walking ritual. Proceedings of the National Acad-
emy of Sciences. 2011; 108(20):8514–8519. https://doi.org/10.1073/pnas.1016955108
23. Wiltermuth SS, Heath C. Synchrony and cooperation. Psychological Science. 2009; 20(1):1–5. https://
doi.org/10.1111/j.1467-9280.2008.02253.x PMID: 19152536
24. Lakens D. Movement synchrony and perceived entitativity. Journal of Experimental Social Psychology.
2010; 46(5):701–708. https://doi.org/10.1016/j.jesp.2010.03.015
25. Valdesolo P, Ouyang J, DeSteno D. The rhythm of joint action: Synchrony promotes cooperative ability.
Journal of Experimental Social Psychology. 2010; 46(4):693–695. https://doi.org/10.1016/j.jesp.2010.
03.004
26. Oullier O, De Guzman GC, Jantzen KJ, Lagarde J, Scott Kelso JA. Social coordination dynamics: Mea-
suring human bonding. Social Neuroscience. 2008; 3(2):178–192. https://doi.org/10.1080/
17470910701563392 PMID: 18552971
27. Kirschner S, Tomasello M. Joint music making promotes prosocial behavior in 4-year-old children. Evo-
lution and Human Behavior. 2010; 31(5):354–364. https://doi.org/10.1016/j.evolhumbehav.2010.04.
004
28. Baimel A, Birch SA, Norenzayan A. Coordinating bodies and minds: Behavioral synchrony fosters men-
talizing. Journal of Experimental Social Psychology. 2018; 74:281–290. https://doi.org/10.1016/j.jesp.
2017.10.008
29. Vicaria IM, Dickens L. Meta-analyses of the intra-and interpersonal outcomes of interpersonal
coordination. Journal of Nonverbal Behavior. 2016; 40(4):335–361. https://doi.org/10.1007/s10919-
016-0238-8
30. Mønster D, Håkonsson DD, Eskildsen JK, Wallot S. Physiological evidence of interpersonal dynamics
in a cooperative production task. Physiology & behavior. 2016; 156:24–34. https://doi.org/10.1016/j.
physbeh.2016.01.004
31. Coulston R, Oviatt S, Darves C. Amplitude convergence in children’s conversational speech with ani-
mated personas. In: Seventh International Conference on Spoken Language Processing; 2002.
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 12 / 14
32. Lubold N, Pon-Barry H. A comparison of acoustic-prosodic entrainment in face-to-face and remote col-
laborative learning dialogues. In: Spoken Language Technology Workshop (SLT), 2014 IEEE. IEEE;
2014. p. 288–293.
33. Lubold N, Pon-Barry H. Acoustic-prosodic entrainment and rapport in collaborative learning dialogues.
In: Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand
Challenge. ACM; 2014. p. 5–12.
34. Julien D, Brault M, Chartrand E
´, Be
´gin J. Immediacy behaviours and synchrony in satisfied and dissatis-
fied couples. Canadian Journal of Behavioural Science/Revue canadienne des sciences du comporte-
ment. 2000; 32(2):84. https://doi.org/10.1037/h0087103
35. Lumsden J, Miles LK, Richardson MJ, Smith CA, Macrae CN. Who syncs? Social motives and interper-
sonal coordination. Journal of Experimental Social Psychology. 2012; 48(3):746–751. https://doi.org/
10.1016/j.jesp.2011.12.007
36. Krych-Appelbaum M, Law JB, Jones D, Barnacz A, Johnson A, Keenan JP. I think I know what you
mean: The role of theory of mind in collaborative communication. Interaction Studies. 2007; 8(2):267–
280. https://doi.org/10.1075/is.8.2.05kry
37. Curhan JR, Pentland A. Thin slices of negotiation: Predicting outcomes from conversational dynamics
within the first 5 minutes. Journal of Applied Psychology. 2007; 92(3):802–811. https://doi.org/10.1037/
0021-9010.92.3.802 PMID: 17484559
38. Riedl C, Woolley AW. Teams vs. crowds: A field test of the relative contribution of incentives, member
ability, and emergent collaboration to crowd-based problem solving performance. Academy of Manage-
ment Discoveries. 2017; 3(4):382–403. https://doi.org/10.5465/amd.2015.0097
39. Wiemann JM, Knapp ML. Turn-taking in conversations. Journal of Communication. 1975; 25(2):75–92.
https://doi.org/10.1111/j.1460-2466.1975.tb00582.x
40. Levinson Stephen C -taking in human communication–origins and implications for language processing
Trends in cognitive sciences,2016; 20 (1), p.6–14. https://doi.org/10.1016/j.tics.2015.10.010 PMID:
26651245
41. Van Baaren RB, Holland RW, Kawakami K, Van Knippenberg A. Mimicry and prosocial behavior. Psy-
chological Science. 2004; 15(1):71–74. https://doi.org/10.1111/j.0963-7214.2004.01501012.x PMID:
14717835
42. Valentine MA, Retelny D, To A, Rahmati N, Doshi T, Bernstein MS. Flash organizations: Crowdsourcing
complex work by structuring crowds as organizations. In: Proceedings of the 2017 CHI Conference on
Human Factors in Computing Systems. ACM; 2017. p. 3523–3537.
43. Lodato TJ, DiSalvo C. Issue-oriented hackathons as material participation. New Media & Society. 2016;
18(4):539–557. https://doi.org/10.1177/1461444816629467
44. O’Mahony S, Barley SR. Do digital telecommunications affect work and organization? The state of our
knowledge. Research in Organizational Behavior, VOL 21, 1999. 1999; 21:125–161.
45. Johnson DW, Johnson RT. Cooperation and the use of technology. Handbook of research for educa-
tional communications and technology: A project of the Association for Educational Communications
and Technology. 1996; p. 1017–1044.
46. Culnan MJ, Markus ML. Information technologies. Sage Publications, Inc; 1987.
47. Daft RL, Lengel RH. Organizational information requirements, media richness and structural design.
Management Science. 1986; 32(5):554–571. https://doi.org/10.1287/mnsc.32.5.554
48. Short J, Williams E, Christie B. The social psychology of telecommunications. John Wiley and Sons
Ltd; 1976.
49. Marlow SL, Lacerenza C, Salas E. Communication in virtual teams: A conceptual framework and
research agenda. Human Resource Management Review. 2017; 27(4):575–589. https://doi.org/10.
1016/j.hrmr.2016.12.005
50. Schulze J, Krumm S. The virtual team player: A review and initial model of knowledge, skills, abilities,
and other characteristics for virtual collaboration. Organizational Psychology Review. 2017; 7(1):66–95.
https://doi.org/10.1177/2041386616675522
51. Team I. Optimizing Team Performance: How and Why Video Conferencing Trumps Audio. Forbes
Insights. 2017.
52. Ramirez A Jr, Walther JB, Burgoon JK, Sunnafrank M. Information-seeking strategies, uncertainty, and
computer-mediated communication: Toward a conceptual model. Human Communication Research.
2002; 28(2):213–228. https://doi.org/10.1111/j.1468-2958.2002.tb00804.x
53. Walther JB. Interpersonal effects in computer-mediated interaction: A relational perspective. Communi-
cation Rresearch. 1992; 19(1):52–90. https://doi.org/10.1177/009365092019001003
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 13 / 14
54. Walther JB. Computer-mediated communication: Impersonal, interpersonal, and hyperpersonal interac-
tion. Communication Research. 1996; 23(1):3–43. https://doi.org/10.1177/009365096023001001
55. Walther JB, Burgoon JK. Relational communication in computer-mediated interaction. Human Commu-
nication Research. 1992; 19(1):50–88. https://doi.org/10.1111/j.1468-2958.1992.tb00295.x
56. Burgoon JK, Bonito JA, Ramirez A Jr, Dunbar NE, Kam K, Fischer J. Testing the interactivity principle:
Effects of mediation, propinquity, and verbal and nonverbal modalities in interpersonal interaction. Jour-
nal of Communication. 2002; 52(3):657–677. https://doi.org/10.1111/j.1460-2466.2002.tb02567.x
57. Chillcoat Y, DeWine S. Teleconferencing and interpersonal communication perception. Journal of
Applied Communication Research. 1985; 13(1):14–32. https://doi.org/10.1080/00909888509388418
58. Engel D, Woolley AW, Aggarwal I, Chabris CF, Takahashi M, Nemoto K, et al. Collective intelligence in
computer-mediated collaboration emerges in different contexts and cultures. In: Proceedings of the
33rd annual ACM conference on human factors in computing systems. ACM; 2015. p. 3769–3778.
59. Amos B, Ludwiczuk B, Satyanarayanan Mea. Openface: A general-purpose face recognition library
with mobile applications. CMU School of Computer Science. 2016.
60. Eyben F, Weninger F, Gross F, Schuller B. Recent developments in opensmile, the munichopen-
source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multi-
media. ACM; 2013. p. 835–838.
61. Levitan R, Gravano A, Willson L, Benus S, Hirschberg J, Nenkova A. Acoustic-prosodic entrainment
and social behavior. In: Proceedings of the 2012 Conference of the North American Chapter of the
Association for Computational Linguistics: Human language technologies. Association for Computa-
tional Linguistics; 2012. p. 11–19.
62. Apple W, Streeter LA, Krauss RM. Effects of pitch and speech rate on personal attributions. Journal of
Personality and Social Psychology. 1979; 37(5):715. https://doi.org/10.1037/0022-3514.37.5.715
63. Degottex, Gilles and Kane, John and Drugman, Thomas and Raitio, Tuomo and Scherer, Stefan. COV-
AREP—A collaborative voice analysis repository for speech technologies. IEEE international confer-
ence on acoustics, speech and signal processing (icassp), 2014, 960–964.
64. Pedott PR, Bacchin LB, Ca
´ceres-Assenc¸o AM, Befi-Lopes DM. Does the duration of silent pauses differ
between words of open and closed class? Audiology-Communication Research. 2014; 19(2):153–157.
65. Baron-Cohen S, Wheelwright S, Hill J, Raste Y, Plumb I. The “Reading the Mind in the Eyes” test
revised version: A study with normal adults, and adults with Asperger syndrome or high-functioning
autism. Journal of Child Psychology and Psychiatry. 2001; 42(2):241–251. https://doi.org/10.1111/
1469-7610.00715 PMID: 11280420
66. Curry O, Chesters MJ. Putting Ourselves in the Other Fellow’s Shoes: The Role of Theory of Mind in
Solving Coordination Problems. Journal of Cognition and Culture. 2012; 12(1-2):147–159. https://doi.
org/10.1163/156853712X633974
67. DiMicco JM, Hollenbach KJ, Bender W. Using visualizations to review a group’s interaction dynamics.
In: CHI’06 extended abstracts on Human factors in computing systems. ACM; 2006. p. 706–711.
68. Shaw ME. Group dynamics: The psychology of small group behavior. McGraw Hill; 1971.
PLOS ONE
Collective intelligence and non-verbal synchrony
PLOS ONE | https://doi.org/10.1371/journal.pone.0247655 March 18, 2021 14 / 14