ArticlePDF Available

Speaker-Listener Neural Coupling Underlies Successful Communication


Abstract and Figures

Verbal communication is a joint activity; however, speech production and comprehension have primarily been analyzed as independent processes within the boundaries of individual brains. Here, we applied fMRI to record brain activity from both speakers and listeners during natural verbal communication. We used the speaker's spatiotemporal brain activity to model listeners' brain activity and found that the speaker's activity is spatially and temporally coupled with the listener's activity. This coupling vanishes when participants fail to communicate. Moreover, though on average the listener's brain activity mirrors the speaker's activity with a delay, we also find areas that exhibit predictive anticipatory responses. We connected the extent of neural coupling to a quantitative measure of story comprehension and find that the greater the anticipatory speaker-listener coupling, the greater the understanding. We argue that the observed alignment of production- and comprehension-based processes serves as a mechanism by which brains convey information.
Imaging the neural activity of a speaker – listener pair during storytelling. ( A ) To record the speaker ’ s speech during the fMRI scan, we used a customized MR-compatible recording device composed of two orthogonal optic microphones ( Right ). The source microphone captures both the background noise and the speaker ’ s speech utterances (upper audio trace), and the reference microphone captures the background noise (middle audio trace). A dual-adaptive fi lter subtracts the reference input from the source channel to recover the speech (lower audio trace). ( B ) The speaker – listener neural coupling was assessed through the use of a general linear model in which the time series in the speaker ’ s brain are used to predict the activity in the listeners ’ brains. To capture the asynchronous temporal interaction between the speaker and the listeners, the speaker ’ s brain activity was con- volved with different temporal shifts. The convolution consists of both backward shifts (up to − 6 s, intervals of 1.5 s, speaker precedes) and forward shifts (up to +6 s, intervals of 1.5 s, listener precedes) relative to the moment of vocalization (0 shift). For each brain area (voxel), the speaker ’ s local response time course is used to predict the time series of the Talairach-normalized, spatially corresponding area in the listener ’ s brain. The model thus captures the extent to which activity in the speaker ’ s brain during speech production is coupled over time with the activity in the listener ’ s brain during speech production.
Content may be subject to copyright.
Speakerlistener neural coupling underlies
successful communication
Greg J. Stephens
, Lauren J. Silbert
, and Uri Hasson
Joseph Henry Laboratories of Physics, Princeton University, Princeton, NJ 08544;
LewisSigler Institute for Integrative Genomics, Princeton University,
Princeton, NJ 08544;
Neuroscience Institute, Princeton University, Princeton, NJ 08544; and
Department of Psychology, Princeton University, Princeton, NJ
Communicated by Charles G. Gross, Princeton University, Princeton, NJ, June 18, 2010 (received for review April 30, 2010)
Verbal communication is a joint activity; however, speech pro-
duction and comprehension have primarily been analyzed as
independent processes within the boundaries of individual brains.
Here, we applied fMRI to record brain activity from both speakers
and listeners during natural verbal communication. We used the
speakers spatiotemporal brain activity to model listenersbrain
activity and found that the speakers activity is spatially and tem-
porally coupled with the listeners activity. This coupling vanishes
when participants fail to communicate. Moreover, though on aver-
age the listeners brain activity mirrors the speakers activity with
a delay, we also nd areas that exhibit predictive anticipatory
responses. We connected the extent of neural coupling to a quan-
titative measure of story comprehension and nd that the greater
the anticipatory speakerlistener coupling, the greater the under-
standing. We argue that the observed alignment of production-
and comprehension-based processes serves as a mechanism by
which brains convey information.
functional MRI
intersubject correlation
language production
Verbal communication is a joint activity by which interlocutors
share information (1). However, little is known about the
neural mechanisms underlying the transfer of linguistic in-
formation across brains. Communication between brains may be
facilitated by a shared neural system dedicated to both the pro-
duction and the perception/comprehension of speech (17).
Existing neurolinguistic studies are mostly concerned with either
speech production or speech comprehension, and focus on cog-
nitive processes within the boundaries of individual brains (1). The
ongoing interaction between the two systems during everyday
communication thus remains largely unknown. In this study we
directly examine the spatial and temporal coupling between pro-
duction and comprehension across brains during natural verbal
Using fMRI, we recorded the brain activity of a speaker telling
an unrehearsed real-life story and the brain activity of a listener
listening to a recording of the story. In the past, recording speech
during an fMRI scan has been problematic due to the high levels
of acoustic noise produced by the MR scanner and the distortion
of the signal by traditional microphones. Thus, we used a cus-
tomized MR-compatible dual-channel optic microphone that
cancels the acoustic noise in real time and achieves high levels of
noise reduction with negligible loss of audibility (see SI Methods
and Fig. 1A). To make the study as ecologically valid as possible,
we instructed the speaker to speak as if telling the story to
a friend (see SI Methods for a transcript of the story and Movie S1
for an actual sample of the recording). To minimize motion
artifacts induced by vocalization during an fMRI scan, we trained
the speaker to produce as little head movement as possible. Next,
we measured the brain activity (n= 11) of a listener listening to
the recorded audio of the spoken story, thereby capturing the
time-locked neural dynamics from both sides of the communi-
cation. Finally, we used a detailed questionnaire to assess the
level of comprehension of each listener.
Our ability to assess speakerlistener interactions builds on
recent ndings that a large portion of the cortex evokes reliable
and selective responses to natural stimuli (e.g., listening to
a story), which are shared across all subjects (811). These studies
use the intersubject correlation method to characterize the sim-
ilarity of cortical responses across individuals during natural
viewing conditions (for a recent review, see ref. 12). Here we
extend these ideas by looking at the direct interaction (not in-
duced by shared external input) between brains during commu-
nication and test whether the speakers neural activity during
speech production is coupled with the shared neural activity ob-
served across all listeners during speech comprehension.
We hypothesize that the speakers brain activity during pro-
duction is spatially and temporally coupled with the brain ac-
tivity measured across listeners during comprehension. During
communication, we expect signicant production/comprehension
couplings to occur if speakers use their comprehension system to
produce speech, and listeners use their production system to pro-
cess the incoming auditory signal (3, 13, 14). Moreover, because
communication unfolds over time, this coupling will exhibit im-
portant temporal structure. In particular, because the speakers
production-based processes mostly precede the listeners com-
prehension-based processes, the listeners neural dynamics will
mirror the speakers neural dynamics with some delay. Con-
versely, when listeners use their production system to emulate
and predict the speakers utterances, we expect the opposite: the
listeners dynamics will precede the speakers dynamics (14).
However, when the speaker and listener are simply responding
to the same shared sensory input (both speaker and listener can
hear the same utterances), we predict synchronous alignment.
Finally, if the neural coupling across brains serves as a mecha-
nism by which the speaker and listener converge on the same
linguistic act, the extent of coupling between a pair of conversers
should predict the success of communication.
SpeakerListener Coupling Model. We formed a model of the
expected activity in the listenersbrains during speech compre-
hension based on the speakers activity during speech production
(see Fig. 1Band Methods for model details). Due to both the
spatiotemporal complexity of natural language and an insufcient
understanding of language-related neural processes, conventional
hypothesis-driven fMRI analysis methods are largely unsuitable
for modeling the brain activity acquired during communication.
We therefore developed an approach that circumvents the need
Author contributions: G.J.S., L.J.S., and U.H. designed research, performed research, con-
tributed new reagents/analytic tools, analyzed data, and wrote the paper.
The authors declare no conict of interest.
Freely available online through the PNAS open access option.
G.J.S. and L.J.S. contributed equally to this work.
To whom correspondence should be addressed. E-mail:
This article contains supporting information online at
1073/pnas.1008662107/-/DCSupplemental. PNAS
August 10, 2010
vol. 107
no. 32
to specify a formal model for the linguistic process in any given
brain area by using the speakers brain activity as a model for
predicting the brain activity within each listener. To analyze the
direct interaction of production and comprehension mechanisms,
we considered only spatially local models that measure the degree
of speakerlistener coupling within the same Talairach location.
To capture the temporal dynamics, we rst shifted the speakers
time courses backward (up to 6 s, intervals of 1.5 s, speaker
precedes) and forward (up to +6 s, intervals of 1.5 s, listener
precedes) relative to the moment of vocalization (0 shift). We
then combined these nine shifted speaker time courses with linear
weights to build a predictive model for the listener brain dy-
namics. Though correlations between shifted voxel time-courses
can complicate the interpretation of the linear weights, here,
these correlations are small as shown by the mean voxel auto-
correlation function (Fig. S1). The weights are thus an approxi-
mately independent measure of the contribution of the speaker
dynamics for each shift. To further ensure that the minimal
autocorrelations among regressors did not affect the models
temporal discriminability, we decorrelated the regressors within
the model and repeated the analysis. Similar results were
obtained in both cases (SI Methods).
Speaker and Listener Brain Activity Exhibits Widespread Coupling
During Communication. For each brain area, we identied signi-
cant speakerlistener couplings by applying an analytical Ftest to
the overall model t, and controlled for multiple comparisons
across the volume using a xed false discovery rate (γ= 0.05; see
Methods for details). Similar results were obtained using a non-
parametric permutation test (Fig. S2). Figure 2Apresents the
results in the left hemisphere; similar results were obtained in the
right hemisphere (Fig. S3A). Signicant speakerlistener coupling
was found in early auditory areas (A1+), superior temporal gyrus,
angular gyrus, temporoparietal junction (these areas are also
known as Wernickes area), parietal lobule, inferior frontal gyrus
(also known as Brocas area), and the insula. Although the func-
tion of these regions is far from clear, they have been associated
with various production and comprehension linguistic processes
(1517). Moreover, both the parietal lobule and the inferior
frontal gyrus have been associated with the mirror neuron system
(18). Finally, we also observed signicant speakerlistener cou-
pling in a collection of extralinguistic areas known to be involved
in the processing of semantic and social aspects of the story (19),
including the precuneus, dorsolateral prefrontal cortex, orbito-
frontal cortex, striatum, and medial prefrontal cortex (see Table
S1 for Talairach coordinates).
Brain areas that were coupled across the speaker and listener
coincided with brain areas used to process incoming verbal
information within the listeners (Fig. 2B). To compare the speaker
listener interactions(production/comprehension)with the listener
listener interactions (comprehension only), we constructed a lis-
tenerlistener coupling map using similar analysis methods and
statistical procedures as above. In agreement with previous work,
the story evoked highly reliable activity in many brainareas across all
listeners (8, 11, 12) (Fig. 2B, yellow). We note that the agreement
with previous work is far from assured: the story here was both
personal and spontaneous, and was recorded in the noisy environ-
ment of the scanner. The similarity in the response patterns across
all listeners underscores a strong tendency to process incoming
verbal information in similar ways. A comparison between the
speakerlistener and the listenerlistener maps reveals an extensive
overlap (Fig. 2B, orange). These areas include many of the sensory-
related, classic linguistic-related and extralinguistic-related brain
areas, demonstrating that many of the areas involved in speech
comprehension (listenerlistener coupling) are also aligned during
communication (speakerlistener coupling).
SpeakerListener Neural Coupling Emerges Only During Communication.
To test whether the extensive speakerlistener coupling emerges
only when information is transferred across interlocutors, we
blocked the communication between speaker and listener. We
repeated the experiment while recording a Russian speaker telling
a story in the scanner, and then played the story to nonRussian-
speaking listeners (n= 11). In this experimental setup, although
the Russian speaker is trying to communicate information, the
listeners are unable to extract the information from the incoming
acoustic sounds. Using identical analysis methods and statistical
thresholds, we found no signicant coupling between the speaker
and the listeners or among the listeners. At signicantly lower
thresholds we found that the nonRussian-speaking listener
listener coupling was conned to early auditory cortices. This
indicates that the reliable activity in most areas, besides early au-
ditory cortex, depends on a successful processing of the incoming
information, and is not driven by the low-level acoustic aspects of
the stimuli.
As further evidence that extensive speakerlistener couplings
rely on successful communication, we asked the same English
speaker to tell another unrehearsed real-life story in the scanner.
We then compared her brain activity while telling the second story
with the brain activity of the listeners to the original story. In this
experimental setup, the speaker transmits information and the
Fig. 1. Imaging the neural activity of a speakerlistener pair during story-
telling. (A) To record the speakers speech during the fMRI scan, we used
a customized MR-compatible recording device composed of two orthogonal
optic microphones (Right). The source microphone captures both the back-
ground noise and the speakers speech utterances (upper audio trace), and
the reference microphone captures the background noise (middle audio
trace). A dual-adaptive lter subtracts the reference input from the source
channel to recover the speech (lower audio trace). (B)Thespeakerlistener
neural coupling was assessed through the use of a general linear model in
which the time series in the speakers brain are used to predict the activity in
the listenersbrains. To capture the asynchronous temporal interaction be-
tween the speaker and the listeners, the speakers brain activity was con-
volved with different temporal shifts. The convolution consists of both
backward shifts (up to 6 s, intervals of 1.5 s, speaker precedes) and forward
shifts (up to +6 s, intervals of 1.5 s, listener precedes) relative to the moment of
vocalization (0 shift). For each brain area (voxel), the speakers local response
time course is used to predict the time series of the Talairach-normalized,
spatially corresponding area in the listeners brain. The model thus captures
the extent to which activity in the speakers brain during speech production is
coupled over time with the activity in the listeners brain during speech
| Stephens et al.
listener receives information; however, the information is decou-
pled across both sides of the communicative act. As in the Russian
story, we found no signicant coupling between the speaker and
the listeners. We therefore conclude that coupling across inter-
locutors emerges only while engaged in shared communication.
ListenersBrain Activity Mirrors the Speakers Brain Activity with
a Delay. Natural communication unfolds over time: speakers
construct grammatical sentences based on thoughts, convert these
to motor plans, and execute the plans to produce utterances;
listeners analyze the sounds, build phonemes into words and
sentences, and ultimately decode utterances into meaning. In our
model of brain coupling, the speakerlistener temporal coupling
is reected in the models weights, where each weight multiplies
a temporally shifted time course of the speakers brain activity
relative to the moment of vocalization (synchronized alignment,
zero shift). As expected, and in agreement with previous work
(12), the activity among the listeners is time locked to the mo-
ment of vocalization (Fig. 3A, blue curve). In contrast, in most
areas, the activity in the listenersbrains lagged behind the activity
in the speakers brain by 13 s (Fig. 3A, red curve). These lagged
responses suggest that on average the speakers production-based
processes precede and likely induce the mirrored activity ob-
served in the listenersbrains during comprehension. These
ndings also allay a methodological concern that the speaker
listener neural coupling is induced simply by the fact that the
speaker is listening to her own speech.
Neural Couplings Display Striking Temporal Differences Across the
Brain. The temporal dynamics of the speakerlistener coupling
varied across brain areas (Fig. 3B). Among signicantly coupled
brain areas, important differences in dynamics are contained
within the weights of the different temporally shifted regressors.
To assess how these patterns varied across the brain, we catego-
rized the weights as delayed (speaker precedes, 6 to 3 s), ad-
vanced (listener precedes, 36 s), or synchronous (1.5 to 1.5 s).
Though such categorizations increase statistical power, they also
reduce the temporal resolution of the analysis. Thus, synchronous
weights reect processes that occur both at the point of vocaliza-
tion (shift 0) as well as ±1.5 s around it, whereas delayed and ad-
vanced weights require shifts of over 1.5 s. Next, for each area we
performed a contrast analysis to identify brain areas in which the
mean weight for each temporal category is statistically greater (P<
0.05) than the mean weight over the rest of the couplings (Meth-
ods). In early auditory areas (A1+) the speakerlistener coupling
is aligned to the speech utterances (synchronized alignment; Fig.
3B, yellow); in posterior areas, including the right TPJ and the
precuneus, the speakers brain activity preceded the listeners
brain activity (speaker precedes; Fig. 3B, blue); in the striatum and
anterior frontal areas, including the mPFC and dlPFC, the lis-
teners brain activity preceded the speakers brain activity (listener
precedes; Fig. 3B, red). To verify that our categorization of tem-
poral couplings was independent of autocorrelations within the
speakers time series, we repeated the analysis after decorrelating
the models regressors. We found nearly exact overlap between the
delayed, synchronous, and advanced maps obtained with the
original and decorrelated models (97%, 97%, and 94%, re-
spectively). The result that signicant speakerlistener couplings
include substantially advanced weights may be indicative of pre-
dictive processes generated by the listeners before the moment of
vocalization to enhance and facilitate the processing of the in-
coming, noisy speech input (14). Furthermore, the spatial speci-
city of the temporal coupling shows that it cannot simply be
attributed to nonspecic, spatially global effects such as arousal. In
comparison to the speakerlistener couplings, the comprehension-
based processes in the listenersbrains were entirely aligned to the
moment of vocalization (Fig. 3C, yellow). Thus, the dynamics of
neural coupling between the speaker and the listeners are funda-
mentally different from the neural dynamics shared among
all listeners.
Extent of SpeakerListener Neural Coupling Predicts the Success of
the Communication Humans use speech to convey information
across brains. Here we administer a behavioral assessment to each
listener at the end of the scan to assess the amount of information
transferred from the speaker to each of the listeners (SI Methods
and Fig. S4). We independently ranked both the listenersbe-
havioral scores and the spatial extent of signicant neural coupling
between the speaker and each listener, and found a strong positive
correlation. (r= 0.55, P<0.07; Fig. 4A). The correlation between
the neural coupling and the level of comprehension was robust to
changes in the exact statistical threshold and remained stable
across many statically signicant Pvalues (Fig. S5A). These nd-
ings suggest that the stronger the neural coupling between inter-
locutors, the better the understanding. Finally, we computed
behavioral correlations with brain regions that show coupling at
Fig. 2. The speakerlistener neural coupling is widespread, extending well beyond low-level auditory areas. (A) Areas in which the activity during speech
production is coupled to the activity during speech comprehension. The analysis was performed on an area-by-area basis, with Pvalues dened using an Ftest
and was corrected for multiple comparisons using FDR methods (γ=0.05).Thendings are presented on sagittal slices of the left hemisphere (similar results
were obtained in the right hemisphere; see Fig. S3). The speakerlistener coupling is extensive and includes early auditory cortices and linguistic and extra-
linguistic brain areas. (B) The overlap (orange) between areas that exhibit reliable activity across all listeners (listenerlistener coupling, yellow) and speaker
listener coupling (red). Note the widespread overlap between the network of brain areas used to process incoming verbal information among the listeners
(comprehension-based activity) and the areas that exhibit similar time-locked activity in the speakers brain (production/comprehension coupling). A1+, early
auditory cortices; TPJ, temporal-parietal junction; dlPFC, dorsolateral prefrontal cortex; IOG, inferior occipital gyrus; Ins, insula; PL, parietal lobule; obFC,
orbitofrontal cortex; PM, premotor cortex; Sta, striatum; mPFC, medial prefrontal cortex.
Stephens et al. PNAS
August 10, 2010
vol. 107
no. 32
different delays (speaker precedes, synchronous, listener pre-
cedes). Using these temporal categories, we analyzed the con-
nection between each category and the level of comprehension.
Remarkably, the extent of cortical areas where the listenersac-
tivity preceded the speakers activity (red areas in Fig. 3B; contrast
P<0.03) provided the strongest correlation with behavior (r=
0.75, P<0.01). This suggeststhat prediction is an important aspect
of successful communication. Furthermore, the behavioral corre-
lation in both cases increases to r= 0.76 (P<0.01) and r= 0.93 (P
<0.0001), respectively, when we remove a single outlier listener
(ranked eighth in Fig. 4 Aand B). Finally, the correlation between
comprehension and neural coupling was robust to changes in the
exact contrast threshold (Fig. S5B). Importantly, we note that the
correlation with the level of understanding cannot be attributed to
low-level processes (e.g., the audibility of the audiole), as the
correlation with behavior increases when we do not include early
auditory areas (synchronous alignment, yellow areas in Fig. 3B).
Communication is a shared activity resulting in a transfer of in-
formation across brains. The ndings shown here indicate that
during successful communication, speakersand listenersbrains
exhibit joint, temporally coupled, response patterns (Figs. 2 and
3). Such neural coupling substantially diminishes in the absence of
communication, such as when listening to an unintelligible foreign
language. Moreover, more extensive speakerlistener neural
couplings result in more successful communication (Fig. 4). We
further show that on average the listeners brain activity mirrors
the speakers brain activity with temporal delays (Fig. 3 Aand B).
Such delays are in agreement with the ow of information across
communicators and imply a causal relationship by which the
speakers production-based processes induce and shape the neu-
ral responses in the listeners brain. Though the sluggish BOLD
response masks the exact temporal speakerlistener coupling, the
delayed and advanced timescales (14 s) coincide with the
timescales of some rudimentary linguistic processes (e.g., in this
study, it took the speaker on average around 0.5 ±0.6 s to produce
words, 9 ±5 s to produce sentences, and even longer to convey
ideas). Moreover, we recently demonstrated that some high-order
brain areas, such as the TPJ and the parietal lobule, have the
capacity to accumulate information over many seconds (8).
Our analysis also identies a subset of brain regions in which
the activity in the listeners brain precedes the activity in the
speakers brain. The listeners anticipatory responses were local-
ized to areas known to be involved in predictions and value rep-
resentation (2023), including the striatum and medial and
dorsolateral prefrontal regions (mPFC, dlPFC). The anticipatory
responses may provide the listeners with more time to process an
input and can compensate for problems with noisy or ambiguous
input (24). This hypothesis is supported by the nding that com-
prehension is facilitated by highly predictable upcoming words
(25). Remarkably, the extent of the listeners anticipatory brain
responses was highly correlated with the level of understanding
(Fig. 4B), indicating that successful communication requires the
active engagement of the listener (26, 27).
The notion that perception and action are coupled has long
been argued by linguists, philosophers, cognitive psychologists,
social psychologists, and neurophysiologists (2, 3, 24, 2833). Our
ndings document the ongoing dynamic interaction between two
Fig. 3. Temporal asymmetry between speakerlistener and listenerlistener
neural couplings. (A) The mean distribution of the temporal weights across
signicantly coupled areas for the listenerlistener (blue curve) and speaker
listener (red curve) brain pairings. For each area, the weights are normalized
to unit magnitude, and error bars denote SEMs. The weight distribution
within the listeners is centered on zero (the moment of vocalization). In
contrast, the weight distribution between the speaker and listeners is shifted;
activity in the listenersbrains lagged activity in the speakersbrainby13 s.
This suggests that on average the speakers production-based processes
precede and hence induce the listenerscomprehension-based processes. (B)
The speakerlistener temporal coupling varies across brain areas. Based on
the distribution of temporal weights within each brain area, we divided the
couplings into three temporal proles: the activity in speakers brain precedes
(blue); the activity is synchronized with ±1.5 s around the onset of vocaliza-
tion (yellow), and the activity in listeners brain precedes (red). In early au-
ditory areas, the speakerlistener coupling is time locked to the moment of
vocalization. In posterior areas, the activity in the speakers brain preceded
the activity in the listenersbrains; in the mPFC, dlPFC, and striatum, the lis-
tenersbrain activity preceded. Results differ slightly in right and left hemi-
sphere. (C) The listenerlistener temporal coupling is time locked to the onset
of vocalization (yellow) across all brain areas in right and left hemispheres.
Note that unique speakerlistener temporal dynamics mitigates the meth-
odological concern that the speakers activity is similar to the listenersactivity
due to the fact that the speaker is merely another listener of her own speech.
Fig. 4. The greater the extent of neural coupling between a speaker and
listener the better the understanding. (A) To assess the comprehension level
of each individual listener, an independent group of raters (n= 6) scored the
listenersdetailed summaries of the story they heard in the scanner. We
ranked the listenersbehavioral scores and the extent of signicant speaker
listener coupling and found a strong positive correlation (r= 0.54, P<0.07)
between the amount of information transferred to each listener and the
extent of neural coupling between the speaker and each listener (Fig. 4A).
These ndings suggest that the stronger the neural coupling between
interlocutors, the better the understanding. (B)Theextentofbrainareas
where the listenersactivity preceded the speakers activity (red areas in Fig.
3B) provided the strongest correlation with behavior (r= 0.75, P<0.01).
These ndings provide evidence that prediction is an important aspect of
successful communication.
| Stephens et al.
brains during the course of natural communication, and reveal
a surprisingly widespread neural coupling between the two,
a priori independent, processes. Such ndings are in agreement
with the theory of interactive linguistic alignment (1). According
to this theory, production and comprehension become tightly
aligned on many different levels during verbal communication,
including the phonetic, phonological, lexical, syntactic, and se-
mantic representations. Accordingly, we observed neural cou-
pling during communication at many different processing levels,
including low-level auditory areas (induced by the shared input),
production-based areas (e.g., Brocas area), comprehension-
based areas (e.g., Wernickes area and TPJ), and high-order ex-
tralinguistic areas (e.g., precuneus and mPFC) that can induce
shared contextual model of the situation (34). Interestingly, some
of these extralinguistic areas are known to be involved in pro-
cessing social information crucial for successful communication,
including, among others, the capacity to discern the beliefs,
desires, and goals of others (15, 16, 31, 3538)
The production/comprehension coupling observed here
resembles the action/perception coupling observed within mirror
neurons (35). Mirror neurons discharge both when a monkey
performs a specic action and when it observes the same action
performed by another (39). Similarly, during the course of com-
munication the production-based and comprehension-based
processes seem to be tightly coupled to each other. Currently,
however, direct proof of such a link remains elusive for two main
reasons. First, mirror neurons have been recorded mainly in the
ventral premotor area (F5) and the intraparietal area (PF/IPL) of
the primate brain during observation and execution of rudimen-
tary motor acts such as reaching or grabbing food. The speaker
listener neural coupling observed here extends far beyond these
two areas. Furthermore, although area F5 in the macaque has
been suggested to overlap with Brocas area in humans, a detailed
characterization of the links between basic motor acts and com-
plex linguistic acts is still missing (see refs. 40 and 41). Second,
based on the fMRI activity recorded during production and
comprehension of the same utterances, we cannot tell whether the
speakerlistener coupling is generated by the activity of the same
neural population that produces and encodes speech or by the
activity of two intermixed but independent populations (42).
Nevertheless, our ndings suggest that, on the systems level, the
coupling between action-based and perception-based processes is
extensive and widely used across many brain areas.
The speakerlistener neural coupling exposes a shared neural
substrate that exhibits temporally aligned response patterns
across communicators. Previous studies have shown that during
free viewing of a movie or listening to a story, the external shared
input can induce similar brain activity across different individuals
(811, 43, 44). Verbal communication enables us to convey in-
formation across brains, independent of the actual external situ-
ation (e.g., telling a story of past events). Such phenomenon may
be reected in the ability of the speaker to directly induce similar
brain patterns in another individual, via speech, in the absence of
any other stimulation. Finally, the recording of the neural activity
from both the speaker brain and the listener brain opens a new
window into the neural basis of interpersonal communication, and
may be used to assess verbal and nonverbal forms of interaction in
both human and other model systems (45). Further understanding
of the neural processes that facilitate neural coupling across
interlocutors may shed light on the mechanisms by which our
brains interact and bind to form societies.
Subject Population. One native-English speaker, one native-Russian speaker,
and 12 native-Englishlisteners, ages 2130 y, participated in one or more of the
experiments. Procedureswere in compliance with the safety guidelines for MRI
research and approved by the Princeton University Committee on Activities
Involving Human Subjects.All participants provided written informed consent.
Experiment and Procedure. To measure neural activity during communication,
we rst used fMRI to record the brain activity of a speaker telling a long, un-
rehearsed story. The speaker had three practice sessions inside the scanner
telling real-life unrehearsed stories. This allowed for the opportunity for the
speaker to familiarize herself with the conditions of storytelling inside the
scanner and to learn to minimize head movements without compromising
storytelling effectiveness. In the nal fMRI session, the speaker told a new,
nonrehearsed, real-life, 15-min account about an experience she had as
a freshman in high school (see SI Methods for the transcript). The story was
recorded using a MR-compatible microphone (see below). The speech re-
cording was aligned with the scanners TTL backtick received at each TR. The
same procedure was followed for the Russian speaker, telling a nonrehearsed,
real-life story in Russian. In focusing on a personally relevant experience, we
strove both to approach the ecological setting of natural communication and
to ensure an intention to communicate by the speaker.
We measured listenersbrain activity during audio playback of the recor-
ded story. We synchronized the functional time series to the speakers vo-
calization through the use of a Matlab code (MathWorks Inc.) written to start
the speakers recording at the onset of the scannersTTLbacktick.Eleven
listeners listened to the recording of the English story. Ten of the listeners
(and one new subject) listened to the recording of the Russian story. None of
the listeners understood Russian. Our experimental design thus allows access
to both sides of the simulated communication. Participants were instructed
before the scan to attend as best as possible to the story, and further that they
would be asked to provide a written account of the story immediately fol-
lowing the scan.
Recording System. We recorded the speakers speech during the fMRI scan
using a customized MR-compatible recording system (FOMRI II; Opto-
acoustics Ltd.). More details are described in SI Methods and in Fig. 1A.
MRI Acquisition. Subjects were scanned in a 3T head-only MRI scanner
(Allegra; Siemens). More details are described in SI Methods.
Data Preprocessing. fMRI data were preprocessed with the BrainVoyager
software package (Brain Innovation, version 1.8) and with additional soft-
ware written with Matlab. More details are described in SI Methods.
Model Analysis. The coupling between speakerlistener and listenerlistener
brain pairings was assessed through the use of a spatially local general linear
model in which temporally shifted voxel time series in one brain are linearly
summed to predict the time series of the spatially corresponding voxel in
another brain. Thus for the speakerlistener coupling we have
where the weights β
!are determined by minimizing the RMS error and are
given by β
!vlistener æ:Here, Cis the covariance matrix cmn ¼Ævmvnæ
and v
!is the vector of shifted voxel times series, vm¼vspeaker ðtmÞ:We
choose τmax ¼4, which is large enough to capture important temporal
processes while also minimizing the overall number of model parameters to
maintain statistical power. We obtain similar results with τmax ¼ð3;5Þ.
We calculated the neural couplings for three brain pairings: (a) speaker,
individual listener; (b) speaker, average listener; and (c) and listener, average
listener. In all three cases, the rst brain in the pairing provides the in-
dependent variables in Eq. 1. The average listener dynamics was constructed
by averaging the functional time series of the (n= 11) listeners at each loca-
tion in the brain. The (listener, average listener) pairing was constructed by
rst building, for each listener, the [listener, (N1) listener average] pairing.
For each listener, the (N1) average listener is the average listener con-
structed from all other listeners. We then solved the coupling model (Eq. 1).
Finally, to connect our ndings to behavioral variability, we constructed the
(speakerlistener) coupling separately for the N individual listeners.
Statistical Analysis. We indentied statistically signicant couplings by
assigning Pvalues through a FishersFtest. In detail, the model in Eq. 1has
δmodel ¼9 degrees of freedom, while δnull ¼Tδmodel 1, where Tis the
number of time points in the experiment. For the prom story, T=581,andT
= 451 for the Russian story. For each model t we construct the Fstatistic and
associated Pvalue p¼1fðF;δmodel;δnull Þwhere fis the cumulative distri-
bution function of the Fstatistic. We also assigned nonparametric Pvalues
by using a null model based on randomly permuted data (n=1,000)ateach
brain location. The nonparametric null model produced Pvalues very close
Stephens et al. PNAS
August 10, 2010
vol. 107
no. 32
to those constructed from the Fstatistic (Fig. S2). We correct for multiple
statistical comparisons when displaying volume maps by controlling the false
discovery rate (FDR). Following ref. 46, we place the Pvalues in ascending
order ðp1...p
q...pnvox Þand choose the maximum value p
qsuch that
qq=nvox <γ,whereγ¼0:05 is the FDR threshold.
To identify signicant listenerlistener couplings, we applied the above
statistical analysis to the model ts across all (n=11)listeneraverage lis-
tener pairs. This is a statistically conservative approach aimed to facilitate
comparison with the speakerlistener brain pairing. The greater statistical
power contained within the (n= 11) different listener/average listener pairs
can be fully exploited using nonparametric bootstrap methods for estimat-
ing the null distribution and calculating signicant couplings (Fig. S6).
Coupling Categorization. For each cortical location and brain pairing, the
parameters β
!of Eq. 1fully characterize the local neural coupling. As seen in
Fig. 3, temporal differences among the couplings reveal important differ-
ences between action and perception. To explore these differences we cat-
egorize each coupling as delayed (speaker precedes, 6 to 3 s), advanced
(listener precedes 36s),orsynchronous(1.5 to 1.5 s) based on the difference
of the mean weight within each category relative to the mean weight outside
the category. We dene the statistical signicance of each category through
a contrast analysis. For example, for the delayed category we dene the
contrast c
!¼ð2;2;2;1;1;1;1;1;1Þand associated tstatistic
q,wherer2is the model variance and
!Þis the covariance of the shifted time-series. The contrasts for other
categories are dened similarly. A large contrast indicates that the coupling is
dominated by weights within that particular temporal category.
Behavioral Assessment. Immediately following the scan, the participants were
asked to record the story they heard in as much detail as possible. Six in-
dependent raters scored each of these listener records accordingly, and the
resulting score was used as a quantitative and objective measure of the lis-
tenersunderstanding.MoredetailsareprovidedinSI Methods and Fig. S4.
ACKNOWLEDGMENTS. We thank our colleagues Forrest Collman, Yadin
Dudai, Bruno Galantucci, Asif Ghazanfar, Adele Goldberg, David Heeger,
Chris Honey, Ifat Levy, Yulia Lerner, Rafael Malach, Stephanie E. Palmer,
Daniela Schiller, and Carrie Theisen for helpful discussion and comments on
the manuscript. G.J.S. was supported in part by the Swartz Foundation.
1. Pickering MJ, Garrod S (2004) Toward a mechanistic psychology of dialogue. Behav
Brain Sci, 27:169190 , discussion 190226.
2. Hari R, Kujala MV (2009) Brain basis of human social interaction: From concepts to
brain imaging. Physiol Rev 89:453479.
3. Liberman AM, Mattingly IG (1985) The motor theory of speech perception revised.
Cognition 21:136.
4. Branigan HP, Pickering MJ, Cleland AA (2000) Syntactic co-ordination in dialogue.
Cognition 75:B13B25.
5. Levelt WJM (1989) Speaking: From Intention to Articulation (MIT Press, Cambridge,
6. Wilson M, Knoblich G (2005) The case for motor involvement in perceiving
conspecics. Psychol Bull 131:460473.
7. Chang F, Dell GS, Bock K (2006) Becoming syntactic. Psychol Rev 113:234272.
8. Hasson U, Yang E, Vallines I, Heeger DJ, Rubin N (2008) A hierarchy of temporal
receptive windows in human cortex. J Neurosci 28:25392550.
9. Golland Y, et al. (2007) Extrinsic and intrinsic systems in the posterior cortex of the
human brain revealed during natural sensory stimulation. Cereb Cortex 17:766777.
10. Hasson U, Nir Y, Levy I, Fuhrmann G, Malach R (2004) Intersubject synchronization of
cortical activity during natural vision. Science 303:16341640.
11. Wilson SM, Molnar-Szakacs I, Iacoboni M (2008) Beyond superior temporal cortex:
Intersubject correlations in narrative speech comprehension. Cereb Cortex 18:
12. Hasson U, Malach R, Heeger D (2010) Reliability of cortical activity during natural
stimulation. Trends Cogn Sci 14:4048.
13. Galantucci B, Fowler CA, Turvey MT (2006) The motor theory of speech perception
reviewed. Psychon Bull Rev 13:361377.
14. Pickering MJ, Garrod S (2007) Do people use language production to make
predictions during comprehension? Trends Cogn Sci 11:105110.
15. Fletcher PC, et al. (1995) Other minds in the brain: A functional imaging study of
theory of mindin story comprehension. Cognition 57:109128.
16. Völlm BA, et al. (2006) Neuronal correlates of theory of mind and empathy: A
functional magnetic resonance imaging study in a nonverbal task. Neuroimage 29:
17. Sahin NT, Pinker S, Cash SS, Schomer D, Halgren E (2009) Sequential processing of
lexical, grammatical, and phonological information within Brocasarea.Science 326:
18. Fadiga L, Craighero L, DAusilio A (2009) Brocas area in language, action, and music.
Ann N Y Acad Sci 1169:448458.
19. Xu J, Kemeny S, Park G, Frattali C, Braun A (2005) Language in context: Emergent
features of word, sentence, and narrative comprehension. Neuroimage 25:
20. Koechlin E, Corrado G, Pietrini P, Grafman J (2000) Dissociating the role of the medial
and lateral anterior prefrontal cortex in human planning. Proc Natl Acad Sci USA 97:
21. Krueger F, Grafman J (2008) The human prefrontal cortex stores structured event
complexes. Understanding Events: From Perception to Action,edsShipleyT,ZacksJM
(Oxford Univ Press, New York), pp 617638.
22. Gilbert SJ, et al. (2006) Functional specialization within rostral prefrontal cortex (area
10): A meta-analysis. J Cogn Neurosci 18:932948.
23. Craig AD (2009) How do you feelnow? The anterior insula and human awareness.
Nat Rev Neurosci 10:5970.
24. Garrod S, Pickering MJ (2004) Why is conversation so easy? Trends Cogn Sci 8:811.
25. Schwanenugel PJ, Shoben EJ (1985) The inuence of sentence constraint on the
scope of facilitation for upcoming words. J Mem Lang 24:232252.
26. Clark HH (1996) Using Language (Cambridge Univ Press, Cambridge, UK).
27. Clark HH, Wilkes-Gibbs D (1986) Referring as a collaborative process. Cognition 22:
28. Galantucci B, Fowler CA, Goldstein L (2009) Perceptuomotor compatibility effects in
speech. Atten Percept Psychophys 71:11381149.
29. Merleau-Ponty M (1945) The Phenomenology of Perception (Gallimard, Paris, France).
30. Gibson JJ (1979) The Ecological Approach to Visual Perception (Houghton Mifin,
31. Amodio DM, Frith CD (2006) Meeting of minds: The medial frontal cortex and social
cognition. Nat Rev Neurosci 7:268277.
32. Rizzolatti G, Fadiga L, Gallese V, Fogassi L (1996) Premotor cortex and the recognition
of motor actions. Brain Res Cogn Brain Res 3:131141.
33. Arbib M (2010) Mirror system activity for action and language is embedded in the
integration of dorsal and ventral pathways. Brain Lang 112:1224.
34. Johnson-Laird PN (1995) Mental Models: Towards a Cognitive Science of Language,
Inference, and Consciousness (Harvard Univ Press, Cambridge, MA).
35. Gallagher HL, et al. (2000) Reading the mind in cartoons and stories: An fMRI study of
theory of mindin verbal and nonverbal tasks. Neuropsychologia 38:1121.
36. Gallagher HL, Frith CD (2003) Functional imaging of theory of mind.Trends Cogn Sci
37. Saxe R, Carey S, Kanwisher N (2004) Understanding other minds: Linking
developmental psychology and functional neuroimaging. Annu Rev Psychol 55:
38. Saxe R, Kanwisher N (2003) People thinking about thinking people. The role of the
temporo-parietal junction in theory of mind.Neuroimage 19:18351842.
39. Rizzolatti G, Fogassi L, Gallese V (2001) Neurophysiological mechanisms underlying
the understanding and imitation of action. Nat Rev Neurosci 2:661670.
40. Rizzolatti G, Arbib MA (1998) Language within our grasp. Trends Neurosci 21:
41. Arbib MA, Liebal K, Pika S (2008) Primate vocalization, gesture, and the evolution of
human language. Curr Anthropol, 49:10531063 , discussion 10631076.
42. Dinstein I, Thomas C, Behrmann M, Heeger DJ (2008) A mirror up to nature. Curr Biol
43. Hanson SJ, Gagliardi AD, Hanson C (2009) Solving the brain synchrony eigenvalue
problem: Conservation of temporal dynamics (fMRI) over subjects doing the same
task. J Comput Neurosci 27:103114.
44. Jääskeläinen IP, et al. (2008) Inter-subject synchronization of prefrontal cortex
hemodynamic activity during natural viewing. Open Neuroimaging J 2:1419.
45. Schippers MB, Roebroeck A, Renken R, Nanetti L, Keysers C (2010) Mapping the
information ow from one brain to another during gestural communication. Proc
Natl Acad Sci USA 107:93889393.
46. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and
powerful approach to multiple testing. JRStatSocB57:289300.
| Stephens et al.
... Social interactions drive some of the most intriguing and complex aspects of animal behavior, from swarming and schooling in large groups [1] to courtship and fighting between pairs [2][3][4]. In our own species, the effective modelling of partners in teaching and conversation [5][6][7], as well as non-verbal improvisational games (see e.g. [8]), is a fundamental aspect of the human social experience. ...
Full-text available
Two-body fighting behavior occurs throughout the animal kingdom to settle dominance disputes. While some universal features have been identified, important questions such as how the dynamics ultimately lead to a winner and loser remain unresolved. Here we analyze fighting behavior in pairs of adult male zebrafish, Danio rerio. We combine a custom imaging apparatus consisting of multiple cameras, a large volume, and a transparent interior cage to avoid reflection artifacts, with computer vision to track multiple body points across multiple organisms while maintaining individual identity in 3D. In the body point dynamics we find a hierarchy of timescales which we use to construct interpretable joint coordinates consisting of relative orientation and distance. We use the distribution of these coordinates to automatically identify fight epochs and we demonstrate the post-fight emergence of an abrupt asymmetry in relative orientations. To quantify short-time behaviors we cluster transitions between joint configurations, and show that fight epochs are spanned by a small number of dynamical clusters, which we denote as maneuvers. The maneuver clusters include motifs such as 'attacks' and 'circling' and we quantify longer-time dynamics through maneuver frequencies. We find that the eventual loser attacks more often towards the end of the contest, suggesting a loser assessment component of the dominance decision.
... 92 Such results are also consistent with previous research on the synchrony between speakers and listeners. 91,93,94 Brain-to-brain synchronization in a naturalistic musical performance provides a window to assess the perception-action and communicative cognitive processes required during musical improvisation 48 ; and coupled with instructor-learner interactions, inter-brain synchronization metrics can inform effective pedagogical techniques. ...
Full-text available
Understanding and predicting others' actions in ecological settings is an important research goal in social neuroscience. Here, we deployed a mobile brain-body imaging (MoBI) methodology to analyze inter-brain communication between professional musicians during a live jazz performance. Specifically, bispectral analysis was conducted to assess the synchronization of scalp electroencephalographic (EEG) signals from three expert musicians during a three-part 45 minute jazz performance, during which a new musician joined every five minutes. The bispectrum was estimated for all musician dyads, electrode combinations, and five frequency bands. The results showed higher bispectrum in the beta and gamma frequency bands (13-50 Hz) when more musicians performed together, and when they played a musical phrase synchronously. Positive bispectrum amplitude changes were found approximately three seconds prior to the identified synchronized performance events suggesting preparatory cortical activity predictive of concerted behavioral action. Moreover, a higher amount of synchronized EEG activity, across electrode regions, was observed as more musicians performed, with inter-brain synchronization between the temporal, parietal, and occipital regions the most frequent. Increased synchrony between the musicians' brain activity reflects shared multi-sensory processing and movement intention in a musical improvisation task.
Full-text available
During conversations people coordinate simultaneous channels of verbal and nonverbal information to hear and be heard. But the presence of background noise levels such as those found in cafes and restaurants can be a barrier to conversational success. Here, we used speech and motion-tracking to reveal the reciprocal processes people use to communicate in noisy environments. Conversations between twenty-two pairs of typical-hearing adults were elicited under different conditions of background noise, while standing or sitting around a table. With the onset of background noise, pairs rapidly adjusted their interpersonal distance and speech level, with the degree of initial change dependent on noise level and talker configuration. Following this transient phase, pairs settled into a sustaining phase in which reciprocal speech and movement-based coordination processes synergistically maintained effective communication, again with the magnitude of stability of these coordination processes covarying with noise level and talker configuration. Finally, as communication breakdowns increased at high noise levels, pairs exhibited resetting behaviors to help restore communication—decreasing interpersonal distance and/or increasing speech levels in response to communication breakdowns. Approximately 78 dB SPL defined a threshold where behavioral processes were no longer sufficient for maintaining effective conversation and communication breakdowns rapidly increased.
The article discusses the methodological issues of business game introduction into English as a Foreign Language classes for Master’s Degree students majoring in materials science and engineering. International educational programs have raised the issues of professionally oriented cooperation as the students involved are reluctant to interact with each other, and the faculty members struggle with lack of expertise in either content or language. Content and Language Integrated Learning is addressed to structure the language and content needed and to organize the instructors’ and students’ activities. To contribute to the students’ cooperation, the authors analyze the psychological data on collaboration and ways to stimulate the unshared information exchange. In accordance with the analysis carried out the scenario of the business game was prepared and the roles were allocated. The assessment tools included the application of a mathematical modeling software package–Comsol Multiphysics; feedback given by the expert invited to observe the game; language records. The analysis of the pedagogical experiment conducted based on the students’ questionnaires and instructors’ observations proved the hypothesis that game-based settings can contribute not only to information, but also to vocabulary exchange and acquisition.
Full-text available
This paper aims to explore what different patterns of head nodding and hand movement coordination mean in conversation by recording and analysing interpersonal coordination as it naturally occurs in social interactions. Understanding the timing and at which frequencies such movement behaviours occur can help us answer how and why we use these signals. Here we use high-resolution motion capture to examine three different types of two-person conversation involving different types of information-sharing, in order to explore the potential meaning and coordination of head nodding and hand motion signals. We also test if the tendency to engage in fast or slow nodding behaviour is a fixed personality trait that differs between individuals. Our results show coordinated slow nodding only in a picture-description task, which implies that this behaviour is not a universal signal of affiliation but is context driven. We also find robust fast nodding behaviour in the two contexts where novel information is exchanged. For hand movement, we find hints of low frequency coordination during one-way information sharing, but found no consistent signalling during information recall. Finally, we show that nodding is consistently driven by context but is not a useful measure of individual differences in social skills. We interpret these results in terms of theories of nonverbal communication and consider how these methods will help advance automated analyses of human conversation behaviours.
Full-text available
Traditional accounts of language processing suggest that monologue – presenting and listening to speeches – should be more straightforward than dialogue – holding a conversation. This is clearly not the case. We argue that conversation is easy because of an interactive processing mechanism that leads to the alignment of linguistic representations between partners. Interactive alignment occurs via automatic alignment channels that are functionally similar to the automatic links between perception and behaviour (the so-called perception – behaviour expressway) proposed in recent accounts of social interaction. We conclude that humans are 'designed' for dialogue rather than monologue. Whereas many people find it difficult to present a speech or even listen to one, we are all very good at talking to each other. This might seem a rather obvious and banal observation, but from a cognitive point of view the apparent ease of conversation is paradoxical. The range and complexity of the information that is required in monologue (preparing and listening to speeches) is much less than is required in dialogue (holding a conversation). In this article we suggest that dialogue processing is easy because it takes advantage of a processing mechanism that we call 'interactive alignment'. We argue that interactive alignment is automatic and reflects the fact that humans are designed for dialogue rather than monologue. We show how research in social cognition points to other similar automatic alignment mechanisms.
Event sequence knowledge is necessary for learning, planning, and performing activities of daily living. Clinical observations suggest that the prefrontal cortex (PFC) is crucial for goal-directed behavior such as carrying out plans, controlling a course of actions, or organizing everyday life routines. This chapter proposes a "representational" approach to PFC function, which assumes that the PFC (a) stores long-term memories of goaloriented event sequence knowledge and (b) seeks to establish the format and categories according to which such information is stored. It argues that the human PFC stores a unique type of knowledge in the form of structured event complexes (SECs). SECs are representations composed of higher-order goal-oriented sequences of events that are involved in the planning and monitoring of complex behavior.
Four experiments were performed to examine the influence of sentence constraint and cue validity on the processing of expected and unexpected congruous sentence completions. Experiment 1 showed that high constraint sentences aided lexical decisions only for expected completions whereas low constraint contexts demonstrated a broader, although weaker context effect. Increasing the proportion of expected completions in Experiment 2 caused inhibition of lexical decisions for unexpected words appearing in high constraint sentences. A similar manipulation for low constraint sentences in Experiment 3 did not show such an effect for unexpected completions. The addition of an incongruous completion condition in Experiment 4 had a negligible effect on the relative proportions of facilitation found in every condition. These findings are consistent with the view that more featural restrictions are generated as sentence constraint and cue validity increase.
In area F5 of the monkey premotor cortex there are neurons that discharge both when the monkey performs an action and when he observes a similar action made by another monkey or by the experimenter. We report here some of the properties of these 'mirror' neurons and we propose that their activity 'represents' the observed action. We posit, then, that this motor representation is at the basis of the understanding of motor events. Finally, on the basis of some recent data showing that, in man, the observation of motor actions activate the posterior part of inferior frontal gyrus, we suggest that the development of the lateral verbal communication system in man derives from a more ancient communication system based on recognition of hand and face gestures.
This book, first published in 1996, argues that language use is more than the sum of a speaker speaking and a listener listening. It is the joint action that emerges when speakers and listeners - writers and readers - perform their individual actions in coordination, as ensembles. The author argues strongly that language use embodies both individual and social processes.