Content uploaded by Greg J Stephens
Author content
All content in this area was uploaded by Greg J Stephens
Content may be subject to copyright.
Speaker–listener neural coupling underlies
successful communication
Greg J. Stephens
a,b,1
, Lauren J. Silbert
c,1
, and Uri Hasson
c,d,2
a
Joseph Henry Laboratories of Physics, Princeton University, Princeton, NJ 08544;
b
Lewis–Sigler Institute for Integrative Genomics, Princeton University,
Princeton, NJ 08544;
c
Neuroscience Institute, Princeton University, Princeton, NJ 08544; and
d
Department of Psychology, Princeton University, Princeton, NJ
08540
Communicated by Charles G. Gross, Princeton University, Princeton, NJ, June 18, 2010 (received for review April 30, 2010)
Verbal communication is a joint activity; however, speech pro-
duction and comprehension have primarily been analyzed as
independent processes within the boundaries of individual brains.
Here, we applied fMRI to record brain activity from both speakers
and listeners during natural verbal communication. We used the
speaker’s spatiotemporal brain activity to model listeners’brain
activity and found that the speaker’s activity is spatially and tem-
porally coupled with the listener’s activity. This coupling vanishes
when participants fail to communicate. Moreover, though on aver-
age the listener’s brain activity mirrors the speaker’s activity with
a delay, we also find areas that exhibit predictive anticipatory
responses. We connected the extent of neural coupling to a quan-
titative measure of story comprehension and find that the greater
the anticipatory speaker–listener coupling, the greater the under-
standing. We argue that the observed alignment of production-
and comprehension-based processes serves as a mechanism by
which brains convey information.
functional MRI
|
intersubject correlation
|
language production
|
language
comprehension
Verbal communication is a joint activity by which interlocutors
share information (1). However, little is known about the
neural mechanisms underlying the transfer of linguistic in-
formation across brains. Communication between brains may be
facilitated by a shared neural system dedicated to both the pro-
duction and the perception/comprehension of speech (1–7).
Existing neurolinguistic studies are mostly concerned with either
speech production or speech comprehension, and focus on cog-
nitive processes within the boundaries of individual brains (1). The
ongoing interaction between the two systems during everyday
communication thus remains largely unknown. In this study we
directly examine the spatial and temporal coupling between pro-
duction and comprehension across brains during natural verbal
communication.
Using fMRI, we recorded the brain activity of a speaker telling
an unrehearsed real-life story and the brain activity of a listener
listening to a recording of the story. In the past, recording speech
during an fMRI scan has been problematic due to the high levels
of acoustic noise produced by the MR scanner and the distortion
of the signal by traditional microphones. Thus, we used a cus-
tomized MR-compatible dual-channel optic microphone that
cancels the acoustic noise in real time and achieves high levels of
noise reduction with negligible loss of audibility (see SI Methods
and Fig. 1A). To make the study as ecologically valid as possible,
we instructed the speaker to speak as if telling the story to
a friend (see SI Methods for a transcript of the story and Movie S1
for an actual sample of the recording). To minimize motion
artifacts induced by vocalization during an fMRI scan, we trained
the speaker to produce as little head movement as possible. Next,
we measured the brain activity (n= 11) of a listener listening to
the recorded audio of the spoken story, thereby capturing the
time-locked neural dynamics from both sides of the communi-
cation. Finally, we used a detailed questionnaire to assess the
level of comprehension of each listener.
Our ability to assess speaker–listener interactions builds on
recent findings that a large portion of the cortex evokes reliable
and selective responses to natural stimuli (e.g., listening to
a story), which are shared across all subjects (8–11). These studies
use the intersubject correlation method to characterize the sim-
ilarity of cortical responses across individuals during natural
viewing conditions (for a recent review, see ref. 12). Here we
extend these ideas by looking at the direct interaction (not in-
duced by shared external input) between brains during commu-
nication and test whether the speaker’s neural activity during
speech production is coupled with the shared neural activity ob-
served across all listeners during speech comprehension.
We hypothesize that the speaker’s brain activity during pro-
duction is spatially and temporally coupled with the brain ac-
tivity measured across listeners during comprehension. During
communication, we expect significant production/comprehension
couplings to occur if speakers use their comprehension system to
produce speech, and listeners use their production system to pro-
cess the incoming auditory signal (3, 13, 14). Moreover, because
communication unfolds over time, this coupling will exhibit im-
portant temporal structure. In particular, because the speaker’s
production-based processes mostly precede the listener’s com-
prehension-based processes, the listener’s neural dynamics will
mirror the speaker’s neural dynamics with some delay. Con-
versely, when listeners use their production system to emulate
and predict the speaker’s utterances, we expect the opposite: the
listener’s dynamics will precede the speaker’s dynamics (14).
However, when the speaker and listener are simply responding
to the same shared sensory input (both speaker and listener can
hear the same utterances), we predict synchronous alignment.
Finally, if the neural coupling across brains serves as a mecha-
nism by which the speaker and listener converge on the same
linguistic act, the extent of coupling between a pair of conversers
should predict the success of communication.
Results
Speaker–Listener Coupling Model. We formed a model of the
expected activity in the listeners’brains during speech compre-
hension based on the speaker’s activity during speech production
(see Fig. 1Band Methods for model details). Due to both the
spatiotemporal complexity of natural language and an insufficient
understanding of language-related neural processes, conventional
hypothesis-driven fMRI analysis methods are largely unsuitable
for modeling the brain activity acquired during communication.
We therefore developed an approach that circumvents the need
Author contributions: G.J.S., L.J.S., and U.H. designed research, performed research, con-
tributed new reagents/analytic tools, analyzed data, and wrote the paper.
The authors declare no conflict of interest.
Freely available online through the PNAS open access option.
1
G.J.S. and L.J.S. contributed equally to this work.
2
To whom correspondence should be addressed. E-mail: hasson@princeton.edu.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
1073/pnas.1008662107/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1008662107 PNAS
|
August 10, 2010
|
vol. 107
|
no. 32
|
14425–14430
NEUROSCIENCEPSYCHOLOGICAL AND
COGNITIVE SCIENCES
to specify a formal model for the linguistic process in any given
brain area by using the speaker’s brain activity as a model for
predicting the brain activity within each listener. To analyze the
direct interaction of production and comprehension mechanisms,
we considered only spatially local models that measure the degree
of speaker–listener coupling within the same Talairach location.
To capture the temporal dynamics, we first shifted the speaker’s
time courses backward (up to −6 s, intervals of 1.5 s, speaker
precedes) and forward (up to +6 s, intervals of 1.5 s, listener
precedes) relative to the moment of vocalization (0 shift). We
then combined these nine shifted speaker time courses with linear
weights to build a predictive model for the listener brain dy-
namics. Though correlations between shifted voxel time-courses
can complicate the interpretation of the linear weights, here,
these correlations are small as shown by the mean voxel auto-
correlation function (Fig. S1). The weights are thus an approxi-
mately independent measure of the contribution of the speaker
dynamics for each shift. To further ensure that the minimal
autocorrelations among regressors did not affect the model’s
temporal discriminability, we decorrelated the regressors within
the model and repeated the analysis. Similar results were
obtained in both cases (SI Methods).
Speaker and Listener Brain Activity Exhibits Widespread Coupling
During Communication. For each brain area, we identified signifi-
cant speaker–listener couplings by applying an analytical Ftest to
the overall model fit, and controlled for multiple comparisons
across the volume using a fixed false discovery rate (γ= 0.05; see
Methods for details). Similar results were obtained using a non-
parametric permutation test (Fig. S2). Figure 2Apresents the
results in the left hemisphere; similar results were obtained in the
right hemisphere (Fig. S3A). Significant speaker–listener coupling
was found in early auditory areas (A1+), superior temporal gyrus,
angular gyrus, temporoparietal junction (these areas are also
known as Wernicke’s area), parietal lobule, inferior frontal gyrus
(also known as Broca’s area), and the insula. Although the func-
tion of these regions is far from clear, they have been associated
with various production and comprehension linguistic processes
(15–17). Moreover, both the parietal lobule and the inferior
frontal gyrus have been associated with the mirror neuron system
(18). Finally, we also observed significant speaker–listener cou-
pling in a collection of extralinguistic areas known to be involved
in the processing of semantic and social aspects of the story (19),
including the precuneus, dorsolateral prefrontal cortex, orbito-
frontal cortex, striatum, and medial prefrontal cortex (see Table
S1 for Talairach coordinates).
Brain areas that were coupled across the speaker and listener
coincided with brain areas used to process incoming verbal
information within the listeners (Fig. 2B). To compare the speaker–
listener interactions(production/comprehension)with the listener–
listener interactions (comprehension only), we constructed a lis-
tener–listener coupling map using similar analysis methods and
statistical procedures as above. In agreement with previous work,
the story evoked highly reliable activity in many brainareas across all
listeners (8, 11, 12) (Fig. 2B, yellow). We note that the agreement
with previous work is far from assured: the story here was both
personal and spontaneous, and was recorded in the noisy environ-
ment of the scanner. The similarity in the response patterns across
all listeners underscores a strong tendency to process incoming
verbal information in similar ways. A comparison between the
speaker–listener and the listener–listener maps reveals an extensive
overlap (Fig. 2B, orange). These areas include many of the sensory-
related, classic linguistic-related and extralinguistic-related brain
areas, demonstrating that many of the areas involved in speech
comprehension (listener–listener coupling) are also aligned during
communication (speaker–listener coupling).
Speaker–Listener Neural Coupling Emerges Only During Communication.
To test whether the extensive speaker–listener coupling emerges
only when information is transferred across interlocutors, we
blocked the communication between speaker and listener. We
repeated the experiment while recording a Russian speaker telling
a story in the scanner, and then played the story to non–Russian-
speaking listeners (n= 11). In this experimental setup, although
the Russian speaker is trying to communicate information, the
listeners are unable to extract the information from the incoming
acoustic sounds. Using identical analysis methods and statistical
thresholds, we found no significant coupling between the speaker
and the listeners or among the listeners. At significantly lower
thresholds we found that the non–Russian-speaking listener–
listener coupling was confined to early auditory cortices. This
indicates that the reliable activity in most areas, besides early au-
ditory cortex, depends on a successful processing of the incoming
information, and is not driven by the low-level acoustic aspects of
the stimuli.
As further evidence that extensive speaker–listener couplings
rely on successful communication, we asked the same English
speaker to tell another unrehearsed real-life story in the scanner.
We then compared her brain activity while telling the second story
with the brain activity of the listeners to the original story. In this
experimental setup, the speaker transmits information and the
Fig. 1. Imaging the neural activity of a speaker–listener pair during story-
telling. (A) To record the speaker’s speech during the fMRI scan, we used
a customized MR-compatible recording device composed of two orthogonal
optic microphones (Right). The source microphone captures both the back-
ground noise and the speaker’s speech utterances (upper audio trace), and
the reference microphone captures the background noise (middle audio
trace). A dual-adaptive filter subtracts the reference input from the source
channel to recover the speech (lower audio trace). (B)Thespeaker–listener
neural coupling was assessed through the use of a general linear model in
which the time series in the speaker’s brain are used to predict the activity in
the listeners’brains. To capture the asynchronous temporal interaction be-
tween the speaker and the listeners, the speaker’s brain activity was con-
volved with different temporal shifts. The convolution consists of both
backward shifts (up to −6 s, intervals of 1.5 s, speaker precedes) and forward
shifts (up to +6 s, intervals of 1.5 s, listener precedes) relative to the moment of
vocalization (0 shift). For each brain area (voxel), the speaker’s local response
time course is used to predict the time series of the Talairach-normalized,
spatially corresponding area in the listener’s brain. The model thus captures
the extent to which activity in the speaker’s brain during speech production is
coupled over time with the activity in the listener’s brain during speech
production.
14426
|
www.pnas.org/cgi/doi/10.1073/pnas.1008662107 Stephens et al.
listener receives information; however, the information is decou-
pled across both sides of the communicative act. As in the Russian
story, we found no significant coupling between the speaker and
the listeners. We therefore conclude that coupling across inter-
locutors emerges only while engaged in shared communication.
Listeners’Brain Activity Mirrors the Speaker’s Brain Activity with
a Delay. Natural communication unfolds over time: speakers
construct grammatical sentences based on thoughts, convert these
to motor plans, and execute the plans to produce utterances;
listeners analyze the sounds, build phonemes into words and
sentences, and ultimately decode utterances into meaning. In our
model of brain coupling, the speaker–listener temporal coupling
is reflected in the model’s weights, where each weight multiplies
a temporally shifted time course of the speaker’s brain activity
relative to the moment of vocalization (synchronized alignment,
zero shift). As expected, and in agreement with previous work
(12), the activity among the listeners is time locked to the mo-
ment of vocalization (Fig. 3A, blue curve). In contrast, in most
areas, the activity in the listeners’brains lagged behind the activity
in the speaker’s brain by 1–3 s (Fig. 3A, red curve). These lagged
responses suggest that on average the speaker’s production-based
processes precede and likely induce the mirrored activity ob-
served in the listeners’brains during comprehension. These
findings also allay a methodological concern that the speaker–
listener neural coupling is induced simply by the fact that the
speaker is listening to her own speech.
Neural Couplings Display Striking Temporal Differences Across the
Brain. The temporal dynamics of the speaker–listener coupling
varied across brain areas (Fig. 3B). Among significantly coupled
brain areas, important differences in dynamics are contained
within the weights of the different temporally shifted regressors.
To assess how these patterns varied across the brain, we catego-
rized the weights as delayed (speaker precedes, −6 to −3 s), ad-
vanced (listener precedes, 3–6 s), or synchronous (−1.5 to 1.5 s).
Though such categorizations increase statistical power, they also
reduce the temporal resolution of the analysis. Thus, synchronous
weights reflect processes that occur both at the point of vocaliza-
tion (shift 0) as well as ±1.5 s around it, whereas delayed and ad-
vanced weights require shifts of over 1.5 s. Next, for each area we
performed a contrast analysis to identify brain areas in which the
mean weight for each temporal category is statistically greater (P<
0.05) than the mean weight over the rest of the couplings (Meth-
ods). In early auditory areas (A1+) the speaker–listener coupling
is aligned to the speech utterances (synchronized alignment; Fig.
3B, yellow); in posterior areas, including the right TPJ and the
precuneus, the speaker’s brain activity preceded the listener’s
brain activity (speaker precedes; Fig. 3B, blue); in the striatum and
anterior frontal areas, including the mPFC and dlPFC, the lis-
tener’s brain activity preceded the speaker’s brain activity (listener
precedes; Fig. 3B, red). To verify that our categorization of tem-
poral couplings was independent of autocorrelations within the
speaker’s time series, we repeated the analysis after decorrelating
the model’s regressors. We found nearly exact overlap between the
delayed, synchronous, and advanced maps obtained with the
original and decorrelated models (97%, 97%, and 94%, re-
spectively). The result that significant speaker–listener couplings
include substantially advanced weights may be indicative of pre-
dictive processes generated by the listeners before the moment of
vocalization to enhance and facilitate the processing of the in-
coming, noisy speech input (14). Furthermore, the spatial speci-
ficity of the temporal coupling shows that it cannot simply be
attributed to nonspecific, spatially global effects such as arousal. In
comparison to the speaker–listener couplings, the comprehension-
based processes in the listeners’brains were entirely aligned to the
moment of vocalization (Fig. 3C, yellow). Thus, the dynamics of
neural coupling between the speaker and the listeners are funda-
mentally different from the neural dynamics shared among
all listeners.
Extent of Speaker–Listener Neural Coupling Predicts the Success of
the Communication Humans use speech to convey information
across brains. Here we administer a behavioral assessment to each
listener at the end of the scan to assess the amount of information
transferred from the speaker to each of the listeners (SI Methods
and Fig. S4). We independently ranked both the listeners’be-
havioral scores and the spatial extent of significant neural coupling
between the speaker and each listener, and found a strong positive
correlation. (r= 0.55, P<0.07; Fig. 4A). The correlation between
the neural coupling and the level of comprehension was robust to
changes in the exact statistical threshold and remained stable
across many statically significant Pvalues (Fig. S5A). These find-
ings suggest that the stronger the neural coupling between inter-
locutors, the better the understanding. Finally, we computed
behavioral correlations with brain regions that show coupling at
Fig. 2. The speaker–listener neural coupling is widespread, extending well beyond low-level auditory areas. (A) Areas in which the activity during speech
production is coupled to the activity during speech comprehension. The analysis was performed on an area-by-area basis, with Pvalues defined using an Ftest
and was corrected for multiple comparisons using FDR methods (γ=0.05).Thefindings are presented on sagittal slices of the left hemisphere (similar results
were obtained in the right hemisphere; see Fig. S3). The speaker–listener coupling is extensive and includes early auditory cortices and linguistic and extra-
linguistic brain areas. (B) The overlap (orange) between areas that exhibit reliable activity across all listeners (listener–listener coupling, yellow) and speaker–
listener coupling (red). Note the widespread overlap between the network of brain areas used to process incoming verbal information among the listeners
(comprehension-based activity) and the areas that exhibit similar time-locked activity in the speaker’s brain (production/comprehension coupling). A1+, early
auditory cortices; TPJ, temporal-parietal junction; dlPFC, dorsolateral prefrontal cortex; IOG, inferior occipital gyrus; Ins, insula; PL, parietal lobule; obFC,
orbitofrontal cortex; PM, premotor cortex; Sta, striatum; mPFC, medial prefrontal cortex.
Stephens et al. PNAS
|
August 10, 2010
|
vol. 107
|
no. 32
|
14427
NEUROSCIENCEPSYCHOLOGICAL AND
COGNITIVE SCIENCES
different delays (speaker precedes, synchronous, listener pre-
cedes). Using these temporal categories, we analyzed the con-
nection between each category and the level of comprehension.
Remarkably, the extent of cortical areas where the listeners’ac-
tivity preceded the speaker’s activity (red areas in Fig. 3B; contrast
P<0.03) provided the strongest correlation with behavior (r=
0.75, P<0.01). This suggeststhat prediction is an important aspect
of successful communication. Furthermore, the behavioral corre-
lation in both cases increases to r= 0.76 (P<0.01) and r= 0.93 (P
<0.0001), respectively, when we remove a single outlier listener
(ranked eighth in Fig. 4 Aand B). Finally, the correlation between
comprehension and neural coupling was robust to changes in the
exact contrast threshold (Fig. S5B). Importantly, we note that the
correlation with the level of understanding cannot be attributed to
low-level processes (e.g., the audibility of the audiofile), as the
correlation with behavior increases when we do not include early
auditory areas (synchronous alignment, yellow areas in Fig. 3B).
Discussion
Communication is a shared activity resulting in a transfer of in-
formation across brains. The findings shown here indicate that
during successful communication, speakers’and listeners’brains
exhibit joint, temporally coupled, response patterns (Figs. 2 and
3). Such neural coupling substantially diminishes in the absence of
communication, such as when listening to an unintelligible foreign
language. Moreover, more extensive speaker–listener neural
couplings result in more successful communication (Fig. 4). We
further show that on average the listener’s brain activity mirrors
the speaker’s brain activity with temporal delays (Fig. 3 Aand B).
Such delays are in agreement with the flow of information across
communicators and imply a causal relationship by which the
speaker’s production-based processes induce and shape the neu-
ral responses in the listener’s brain. Though the sluggish BOLD
response masks the exact temporal speaker–listener coupling, the
delayed and advanced timescales (∼1–4 s) coincide with the
timescales of some rudimentary linguistic processes (e.g., in this
study, it took the speaker on average around 0.5 ±0.6 s to produce
words, 9 ±5 s to produce sentences, and even longer to convey
ideas). Moreover, we recently demonstrated that some high-order
brain areas, such as the TPJ and the parietal lobule, have the
capacity to accumulate information over many seconds (8).
Our analysis also identifies a subset of brain regions in which
the activity in the listener’s brain precedes the activity in the
speaker’s brain. The listener’s anticipatory responses were local-
ized to areas known to be involved in predictions and value rep-
resentation (20–23), including the striatum and medial and
dorsolateral prefrontal regions (mPFC, dlPFC). The anticipatory
responses may provide the listeners with more time to process an
input and can compensate for problems with noisy or ambiguous
input (24). This hypothesis is supported by the finding that com-
prehension is facilitated by highly predictable upcoming words
(25). Remarkably, the extent of the listener’s anticipatory brain
responses was highly correlated with the level of understanding
(Fig. 4B), indicating that successful communication requires the
active engagement of the listener (26, 27).
The notion that perception and action are coupled has long
been argued by linguists, philosophers, cognitive psychologists,
social psychologists, and neurophysiologists (2, 3, 24, 28–33). Our
findings document the ongoing dynamic interaction between two
Fig. 3. Temporal asymmetry between speaker–listener and listener–listener
neural couplings. (A) The mean distribution of the temporal weights across
significantly coupled areas for the listener–listener (blue curve) and speaker–
listener (red curve) brain pairings. For each area, the weights are normalized
to unit magnitude, and error bars denote SEMs. The weight distribution
within the listeners is centered on zero (the moment of vocalization). In
contrast, the weight distribution between the speaker and listeners is shifted;
activity in the listeners’brains lagged activity in the speaker’sbrainby1–3 s.
This suggests that on average the speaker’s production-based processes
precede and hence induce the listeners’comprehension-based processes. (B)
The speaker–listener temporal coupling varies across brain areas. Based on
the distribution of temporal weights within each brain area, we divided the
couplings into three temporal profiles: the activity in speaker’s brain precedes
(blue); the activity is synchronized with ±1.5 s around the onset of vocaliza-
tion (yellow), and the activity in listener’s brain precedes (red). In early au-
ditory areas, the speaker–listener coupling is time locked to the moment of
vocalization. In posterior areas, the activity in the speaker’s brain preceded
the activity in the listeners’brains; in the mPFC, dlPFC, and striatum, the lis-
teners’brain activity preceded. Results differ slightly in right and left hemi-
sphere. (C) The listener–listener temporal coupling is time locked to the onset
of vocalization (yellow) across all brain areas in right and left hemispheres.
Note that unique speaker–listener temporal dynamics mitigates the meth-
odological concern that the speaker’s activity is similar to the listeners’activity
due to the fact that the speaker is merely another listener of her own speech.
Fig. 4. The greater the extent of neural coupling between a speaker and
listener the better the understanding. (A) To assess the comprehension level
of each individual listener, an independent group of raters (n= 6) scored the
listeners’detailed summaries of the story they heard in the scanner. We
ranked the listeners’behavioral scores and the extent of significant speaker–
listener coupling and found a strong positive correlation (r= 0.54, P<0.07)
between the amount of information transferred to each listener and the
extent of neural coupling between the speaker and each listener (Fig. 4A).
These findings suggest that the stronger the neural coupling between
interlocutors, the better the understanding. (B)Theextentofbrainareas
where the listeners’activity preceded the speaker’s activity (red areas in Fig.
3B) provided the strongest correlation with behavior (r= 0.75, P<0.01).
These findings provide evidence that prediction is an important aspect of
successful communication.
14428
|
www.pnas.org/cgi/doi/10.1073/pnas.1008662107 Stephens et al.
brains during the course of natural communication, and reveal
a surprisingly widespread neural coupling between the two,
a priori independent, processes. Such findings are in agreement
with the theory of interactive linguistic alignment (1). According
to this theory, production and comprehension become tightly
aligned on many different levels during verbal communication,
including the phonetic, phonological, lexical, syntactic, and se-
mantic representations. Accordingly, we observed neural cou-
pling during communication at many different processing levels,
including low-level auditory areas (induced by the shared input),
production-based areas (e.g., Broca’s area), comprehension-
based areas (e.g., Wernicke’s area and TPJ), and high-order ex-
tralinguistic areas (e.g., precuneus and mPFC) that can induce
shared contextual model of the situation (34). Interestingly, some
of these extralinguistic areas are known to be involved in pro-
cessing social information crucial for successful communication,
including, among others, the capacity to discern the beliefs,
desires, and goals of others (15, 16, 31, 35–38)
The production/comprehension coupling observed here
resembles the action/perception coupling observed within mirror
neurons (35). Mirror neurons discharge both when a monkey
performs a specific action and when it observes the same action
performed by another (39). Similarly, during the course of com-
munication the production-based and comprehension-based
processes seem to be tightly coupled to each other. Currently,
however, direct proof of such a link remains elusive for two main
reasons. First, mirror neurons have been recorded mainly in the
ventral premotor area (F5) and the intraparietal area (PF/IPL) of
the primate brain during observation and execution of rudimen-
tary motor acts such as reaching or grabbing food. The speaker–
listener neural coupling observed here extends far beyond these
two areas. Furthermore, although area F5 in the macaque has
been suggested to overlap with Broca’s area in humans, a detailed
characterization of the links between basic motor acts and com-
plex linguistic acts is still missing (see refs. 40 and 41). Second,
based on the fMRI activity recorded during production and
comprehension of the same utterances, we cannot tell whether the
speaker–listener coupling is generated by the activity of the same
neural population that produces and encodes speech or by the
activity of two intermixed but independent populations (42).
Nevertheless, our findings suggest that, on the systems level, the
coupling between action-based and perception-based processes is
extensive and widely used across many brain areas.
The speaker–listener neural coupling exposes a shared neural
substrate that exhibits temporally aligned response patterns
across communicators. Previous studies have shown that during
free viewing of a movie or listening to a story, the external shared
input can induce similar brain activity across different individuals
(8–11, 43, 44). Verbal communication enables us to convey in-
formation across brains, independent of the actual external situ-
ation (e.g., telling a story of past events). Such phenomenon may
be reflected in the ability of the speaker to directly induce similar
brain patterns in another individual, via speech, in the absence of
any other stimulation. Finally, the recording of the neural activity
from both the speaker brain and the listener brain opens a new
window into the neural basis of interpersonal communication, and
may be used to assess verbal and nonverbal forms of interaction in
both human and other model systems (45). Further understanding
of the neural processes that facilitate neural coupling across
interlocutors may shed light on the mechanisms by which our
brains interact and bind to form societies.
Methods
Subject Population. One native-English speaker, one native-Russian speaker,
and 12 native-Englishlisteners, ages 21–30 y, participated in one or more of the
experiments. Procedureswere in compliance with the safety guidelines for MRI
research and approved by the Princeton University Committee on Activities
Involving Human Subjects.All participants provided written informed consent.
Experiment and Procedure. To measure neural activity during communication,
we first used fMRI to record the brain activity of a speaker telling a long, un-
rehearsed story. The speaker had three practice sessions inside the scanner
telling real-life unrehearsed stories. This allowed for the opportunity for the
speaker to familiarize herself with the conditions of storytelling inside the
scanner and to learn to minimize head movements without compromising
storytelling effectiveness. In the final fMRI session, the speaker told a new,
nonrehearsed, real-life, 15-min account about an experience she had as
a freshman in high school (see SI Methods for the transcript). The story was
recorded using a MR-compatible microphone (see below). The speech re-
cording was aligned with the scanner’s TTL backtick received at each TR. The
same procedure was followed for the Russian speaker, telling a nonrehearsed,
real-life story in Russian. In focusing on a personally relevant experience, we
strove both to approach the ecological setting of natural communication and
to ensure an intention to communicate by the speaker.
We measured listeners’brain activity during audio playback of the recor-
ded story. We synchronized the functional time series to the speaker’s vo-
calization through the use of a Matlab code (MathWorks Inc.) written to start
the speaker’s recording at the onset of the scanner’sTTLbacktick.Eleven
listeners listened to the recording of the English story. Ten of the listeners
(and one new subject) listened to the recording of the Russian story. None of
the listeners understood Russian. Our experimental design thus allows access
to both sides of the simulated communication. Participants were instructed
before the scan to attend as best as possible to the story, and further that they
would be asked to provide a written account of the story immediately fol-
lowing the scan.
Recording System. We recorded the speaker’s speech during the fMRI scan
using a customized MR-compatible recording system (FOMRI II; Opto-
acoustics Ltd.). More details are described in SI Methods and in Fig. 1A.
MRI Acquisition. Subjects were scanned in a 3T head-only MRI scanner
(Allegra; Siemens). More details are described in SI Methods.
Data Preprocessing. fMRI data were preprocessed with the BrainVoyager
software package (Brain Innovation, version 1.8) and with additional soft-
ware written with Matlab. More details are described in SI Methods.
Model Analysis. The coupling between speaker–listener and listener–listener
brain pairings was assessed through the use of a spatially local general linear
model in which temporally shifted voxel time series in one brain are linearly
summed to predict the time series of the spatially corresponding voxel in
another brain. Thus for the speaker–listener coupling we have
vmodel
listenerðtÞ¼ ∑
τ¼τmax
τ¼−τmax
βivspeakerðtþτÞ;[1]
where the weights β
!are determined by minimizing the RMS error and are
given by β
!¼ðcÞ−1Æv
!vlistener æ:Here, Cis the covariance matrix cmn ¼Ævmvnæ
and v
!is the vector of shifted voxel times series, vm¼vspeaker ðt−mÞ:We
choose τmax ¼4, which is large enough to capture important temporal
processes while also minimizing the overall number of model parameters to
maintain statistical power. We obtain similar results with τmax ¼ð3;5Þ.
We calculated the neural couplings for three brain pairings: (a) speaker,
individual listener; (b) speaker, average listener; and (c) and listener, average
listener. In all three cases, the first brain in the pairing provides the in-
dependent variables in Eq. 1. The average listener dynamics was constructed
by averaging the functional time series of the (n= 11) listeners at each loca-
tion in the brain. The (listener, average listener) pairing was constructed by
first building, for each listener, the [listener, (N−1) listener average] pairing.
For each listener, the (N−1) average listener is the average listener con-
structed from all other listeners. We then solved the coupling model (Eq. 1).
Finally, to connect our findings to behavioral variability, we constructed the
(speaker–listener) coupling separately for the N individual listeners.
Statistical Analysis. We indentified statistically significant couplings by
assigning Pvalues through a Fisher’sFtest. In detail, the model in Eq. 1has
δmodel ¼9 degrees of freedom, while δnull ¼T−δmodel −1, where Tis the
number of time points in the experiment. For the prom story, T=581,andT
= 451 for the Russian story. For each model fit we construct the Fstatistic and
associated Pvalue p¼1−fðF;δmodel;δnull Þwhere fis the cumulative distri-
bution function of the Fstatistic. We also assigned nonparametric Pvalues
by using a null model based on randomly permuted data (n=1,000)ateach
brain location. The nonparametric null model produced Pvalues very close
Stephens et al. PNAS
|
August 10, 2010
|
vol. 107
|
no. 32
|
14429
NEUROSCIENCEPSYCHOLOGICAL AND
COGNITIVE SCIENCES
to those constructed from the Fstatistic (Fig. S2). We correct for multiple
statistical comparisons when displaying volume maps by controlling the false
discovery rate (FDR). Following ref. 46, we place the Pvalues in ascending
order ðp1...p∗
q...pnvox Þand choose the maximum value p∗
qsuch that
p∗
qq=nvox <γ,whereγ¼0:05 is the FDR threshold.
To identify significant listener–listener couplings, we applied the above
statistical analysis to the model fits across all (n=11)listener–average lis-
tener pairs. This is a statistically conservative approach aimed to facilitate
comparison with the speaker–listener brain pairing. The greater statistical
power contained within the (n= 11) different listener/average listener pairs
can be fully exploited using nonparametric bootstrap methods for estimat-
ing the null distribution and calculating significant couplings (Fig. S6).
Coupling Categorization. For each cortical location and brain pairing, the
parameters β
!of Eq. 1fully characterize the local neural coupling. As seen in
Fig. 3, temporal differences among the couplings reveal important differ-
ences between action and perception. To explore these differences we cat-
egorize each coupling as delayed (speaker precedes, −6 to −3 s), advanced
(listener precedes 3–6s),orsynchronous(−1.5 to 1.5 s) based on the difference
of the mean weight within each category relative to the mean weight outside
the category. We define the statistical significance of each category through
a contrast analysis. For example, for the delayed category we define the
contrast c
!¼ð2;2;2;−1;−1;−1;−1;−1;−1Þand associated tstatistic
t¼Æc
!·β
!æ=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð1−r2Þ·ðc†
→
·covðv
!Þ·c
!
q,wherer2is the model variance and
covðv
!Þis the covariance of the shifted time-series. The contrasts for other
categories are defined similarly. A large contrast indicates that the coupling is
dominated by weights within that particular temporal category.
Behavioral Assessment. Immediately following the scan, the participants were
asked to record the story they heard in as much detail as possible. Six in-
dependent raters scored each of these listener records accordingly, and the
resulting score was used as a quantitative and objective measure of the lis-
tener’sunderstanding.MoredetailsareprovidedinSI Methods and Fig. S4.
ACKNOWLEDGMENTS. We thank our colleagues Forrest Collman, Yadin
Dudai, Bruno Galantucci, Asif Ghazanfar, Adele Goldberg, David Heeger,
Chris Honey, Ifat Levy, Yulia Lerner, Rafael Malach, Stephanie E. Palmer,
Daniela Schiller, and Carrie Theisen for helpful discussion and comments on
the manuscript. G.J.S. was supported in part by the Swartz Foundation.
1. Pickering MJ, Garrod S (2004) Toward a mechanistic psychology of dialogue. Behav
Brain Sci, 27:169–190 , discussion 190–226.
2. Hari R, Kujala MV (2009) Brain basis of human social interaction: From concepts to
brain imaging. Physiol Rev 89:453–479.
3. Liberman AM, Mattingly IG (1985) The motor theory of speech perception revised.
Cognition 21:1–36.
4. Branigan HP, Pickering MJ, Cleland AA (2000) Syntactic co-ordination in dialogue.
Cognition 75:B13–B25.
5. Levelt WJM (1989) Speaking: From Intention to Articulation (MIT Press, Cambridge,
MA).
6. Wilson M, Knoblich G (2005) The case for motor involvement in perceiving
conspecifics. Psychol Bull 131:460–473.
7. Chang F, Dell GS, Bock K (2006) Becoming syntactic. Psychol Rev 113:234–272.
8. Hasson U, Yang E, Vallines I, Heeger DJ, Rubin N (2008) A hierarchy of temporal
receptive windows in human cortex. J Neurosci 28:2539–2550.
9. Golland Y, et al. (2007) Extrinsic and intrinsic systems in the posterior cortex of the
human brain revealed during natural sensory stimulation. Cereb Cortex 17:766–777.
10. Hasson U, Nir Y, Levy I, Fuhrmann G, Malach R (2004) Intersubject synchronization of
cortical activity during natural vision. Science 303:1634–1640.
11. Wilson SM, Molnar-Szakacs I, Iacoboni M (2008) Beyond superior temporal cortex:
Intersubject correlations in narrative speech comprehension. Cereb Cortex 18:
230–242.
12. Hasson U, Malach R, Heeger D (2010) Reliability of cortical activity during natural
stimulation. Trends Cogn Sci 14:40–48.
13. Galantucci B, Fowler CA, Turvey MT (2006) The motor theory of speech perception
reviewed. Psychon Bull Rev 13:361–377.
14. Pickering MJ, Garrod S (2007) Do people use language production to make
predictions during comprehension? Trends Cogn Sci 11:105–110.
15. Fletcher PC, et al. (1995) Other minds in the brain: A functional imaging study of
“theory of mind”in story comprehension. Cognition 57:109–128.
16. Völlm BA, et al. (2006) Neuronal correlates of theory of mind and empathy: A
functional magnetic resonance imaging study in a nonverbal task. Neuroimage 29:
90–98.
17. Sahin NT, Pinker S, Cash SS, Schomer D, Halgren E (2009) Sequential processing of
lexical, grammatical, and phonological information within Broca’sarea.Science 326:
445–449.
18. Fadiga L, Craighero L, D’Ausilio A (2009) Broca’s area in language, action, and music.
Ann N Y Acad Sci 1169:448–458.
19. Xu J, Kemeny S, Park G, Frattali C, Braun A (2005) Language in context: Emergent
features of word, sentence, and narrative comprehension. Neuroimage 25:
1002–1015.
20. Koechlin E, Corrado G, Pietrini P, Grafman J (2000) Dissociating the role of the medial
and lateral anterior prefrontal cortex in human planning. Proc Natl Acad Sci USA 97:
7651–7656.
21. Krueger F, Grafman J (2008) The human prefrontal cortex stores structured event
complexes. Understanding Events: From Perception to Action,edsShipleyT,ZacksJM
(Oxford Univ Press, New York), pp 617–638.
22. Gilbert SJ, et al. (2006) Functional specialization within rostral prefrontal cortex (area
10): A meta-analysis. J Cogn Neurosci 18:932–948.
23. Craig AD (2009) How do you feel—now? The anterior insula and human awareness.
Nat Rev Neurosci 10:59–70.
24. Garrod S, Pickering MJ (2004) Why is conversation so easy? Trends Cogn Sci 8:8–11.
25. Schwanenflugel PJ, Shoben EJ (1985) The influence of sentence constraint on the
scope of facilitation for upcoming words. J Mem Lang 24:232–252.
26. Clark HH (1996) Using Language (Cambridge Univ Press, Cambridge, UK).
27. Clark HH, Wilkes-Gibbs D (1986) Referring as a collaborative process. Cognition 22:
1–39.
28. Galantucci B, Fowler CA, Goldstein L (2009) Perceptuomotor compatibility effects in
speech. Atten Percept Psychophys 71:1138–1149.
29. Merleau-Ponty M (1945) The Phenomenology of Perception (Gallimard, Paris, France).
30. Gibson JJ (1979) The Ecological Approach to Visual Perception (Houghton Mifflin,
Boston).
31. Amodio DM, Frith CD (2006) Meeting of minds: The medial frontal cortex and social
cognition. Nat Rev Neurosci 7:268–277.
32. Rizzolatti G, Fadiga L, Gallese V, Fogassi L (1996) Premotor cortex and the recognition
of motor actions. Brain Res Cogn Brain Res 3:131–141.
33. Arbib M (2010) Mirror system activity for action and language is embedded in the
integration of dorsal and ventral pathways. Brain Lang 112:12–24.
34. Johnson-Laird PN (1995) Mental Models: Towards a Cognitive Science of Language,
Inference, and Consciousness (Harvard Univ Press, Cambridge, MA).
35. Gallagher HL, et al. (2000) Reading the mind in cartoons and stories: An fMRI study of
‘theory of mind’in verbal and nonverbal tasks. Neuropsychologia 38:11–21.
36. Gallagher HL, Frith CD (2003) Functional imaging of ‘theory of mind’.Trends Cogn Sci
7:77–83.
37. Saxe R, Carey S, Kanwisher N (2004) Understanding other minds: Linking
developmental psychology and functional neuroimaging. Annu Rev Psychol 55:
87–124.
38. Saxe R, Kanwisher N (2003) People thinking about thinking people. The role of the
temporo-parietal junction in “theory of mind”.Neuroimage 19:1835–1842.
39. Rizzolatti G, Fogassi L, Gallese V (2001) Neurophysiological mechanisms underlying
the understanding and imitation of action. Nat Rev Neurosci 2:661–670.
40. Rizzolatti G, Arbib MA (1998) Language within our grasp. Trends Neurosci 21:
188–194.
41. Arbib MA, Liebal K, Pika S (2008) Primate vocalization, gesture, and the evolution of
human language. Curr Anthropol, 49:1053–1063 , discussion 1063–1076.
42. Dinstein I, Thomas C, Behrmann M, Heeger DJ (2008) A mirror up to nature. Curr Biol
18:R13–R18.
43. Hanson SJ, Gagliardi AD, Hanson C (2009) Solving the brain synchrony eigenvalue
problem: Conservation of temporal dynamics (fMRI) over subjects doing the same
task. J Comput Neurosci 27:103–114.
44. Jääskeläinen IP, et al. (2008) Inter-subject synchronization of prefrontal cortex
hemodynamic activity during natural viewing. Open Neuroimaging J 2:14–19.
45. Schippers MB, Roebroeck A, Renken R, Nanetti L, Keysers C (2010) Mapping the
information flow from one brain to another during gestural communication. Proc
Natl Acad Sci USA 107:9388–9393.
46. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and
powerful approach to multiple testing. JRStatSocB57:289–300.
14430
|
www.pnas.org/cgi/doi/10.1073/pnas.1008662107 Stephens et al.