ArticlePDF Available

Neural mechanisms of lipreading in the Polish-speaking population: effects of linguistic complexity and sex differences

Springer Nature
Scientific Reports
Authors:

Abstract and Figures

Lipreading, the ability to understand speech by observing lips and facial movements, is a vital communication skill that enhances speech comprehension in diverse contexts, such as noisy environments. This study examines the neural mechanisms underlying lipreading in the Polish-speaking population, focusing on the complexity of linguistic material and potential sex differences in lipreading ability. Cohort of 51 participants (26 females) underwent a behavioral lipreading test and an fMRI-based speech comprehension task, utilizing visual-only and audiovisual stimuli, manipulating the lexicality and grammar of linguistic materials. Results indicated that males and females did not differ significantly in objective lipreading skills, though females rated their subjective abilities higher. Neuroimaging revealed increased activation in regions associated with speech processing, such as the superior temporal cortex, when participants engaged in visual-only lipreading compared to audiovisual condition. Lexicality of visual-only material engaged distinct neural pathways, highlighting the role of motor areas in visual speech comprehension. These findings contribute to understanding the neurocognitive processes in lipreading, suggesting that visual speech perception is a multimodal process involving extensive brain regions typically associated with auditory processing. The study underscores the potential of lipreading training in rehabilitating individuals with hearing loss and informs the development of assistive technologies.
This content is subject to copyright. Terms and conditions apply.
Neural mechanisms of lipreading
in the Polish-speaking population:
eects of linguistic complexity and
sex dierences
Jakub Wojciechowski 1,2,5, Joanna Beck 1,3,5, Hanna Cygan 1, Agnieszka Pankowska4 &
Tomasz Wolak 1
Lipreading, the ability to understand speech by observing lips and facial movements, is a vital
communication skill that enhances speech comprehension in diverse contexts, such as noisy
environments. This study examines the neural mechanisms underlying lipreading in the Polish-
speaking population, focusing on the complexity of linguistic material and potential sex dierences
in lipreading ability. Cohort of 51 participants (26 females) underwent a behavioral lipreading test and
an fMRI-based speech comprehension task, utilizing visual-only and audiovisual stimuli, manipulating
the lexicality and grammar of linguistic materials. Results indicated that males and females did not
dier signicantly in objective lipreading skills, though females rated their subjective abilities higher.
Neuroimaging revealed increased activation in regions associated with speech processing, such
as the superior temporal cortex, when participants engaged in visual-only lipreading compared to
audiovisual condition. Lexicality of visual-only material engaged distinct neural pathways, highlighting
the role of motor areas in visual speech comprehension. These ndings contribute to understanding
the neurocognitive processes in lipreading, suggesting that visual speech perception is a multimodal
process involving extensive brain regions typically associated with auditory processing. The study
underscores the potential of lipreading training in rehabilitating individuals with hearing loss and
informs the development of assistive technologies.
Keywords Lipreading, Speech-reading, Audiovisual integration, Sex dierences, fMRI, Language
comprehension
Lipreading is the ability to extract speech information from the movements of a speaker’s lips and face. It is far
from being a specialized skill limited to those with hearing impairments, and plays a signicant role in everyday
communication across the general population. It is particularly vital in environments where auditory cues are
insucient or absent, such as noisy public spaces or situations where individuals must maintain silence. Visual
information from the talker’s face helps ll in the missing auditory information (e.g.1,2). e universality of
lipreading is underscored by its inclusion in early communication development, with infants showing sensitivity
to visual speech cues even before they develop full auditory speech capabilities3. Articulatory lip movements
enable visemes recognition (the visual equivalent of phonemes) and supplement degraded auditory information
during speech perception. Despite its practical importance, the neural and cognitive mechanisms underlying
lipreading still need to be better understood. However, recent advances in neuroscience and psychology have
shed new light on the neural networks involved in visual speech perception and the role of visual cues in speech
comprehension (e.g.4,5).
In particular, neuroimaging studies have shown that the brain regions involved in lipreading overlap with
those involved in auditory speech processing, suggesting that lipreading relies on similar neural mechanisms
as normal hearing, including the auditory cortex6,7. Additionally, despite simplifying lipreading as “hearing
without sounds”, brain regions associated with language processing, such as the le inferior frontal gyrus (IFG)
1Bioimaging Research Center, Institute of Physiology and Pathology of Hearing, 10 Mochnackiego St, Warsaw 02-
042, Poland. 2Nencki Institute of Experimental Biology, Polish Academy of Sciences, 3 Pasteur St, Warsaw 02-093,
Poland. 3Medical Faculty, Lazarski University, Warsaw 02-662, Poland. 4Rehabilitation Clinic, Institute of Physiology
and Pathology of Hearing, 10 Mochnackiego St, Warsaw 02-042, Poland. 5Jakub Wojciechowski and Joanna Beck:
Equal rst authors. email: joannaludwikabeck@gmail.com
OPEN
Scientic Reports | (2025) 15:13253 1
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports
Content courtesy of Springer Nature, terms of use apply. Rights reserved
and posterior superior temporal gyrus (pSTG), and visual cortex are also activated8,9. Furthermore audiovisual
integration during lipreading showed involvement of the superior temporal sulcus and pSTG10,11. ese regions
similarly engage in auditory speech perception and comprehension in individuals with hearing impairments and
among normal hearing populations12,13.
e contribution of visual processing during lipreading has been highlighted in recent studies. 4Peelle
et al. (2022) conducted a brain imaging study to investigate the neural mechanisms underlying audiovisual
integration processes. e researchers examined the brain activity of 60 healthy adults while they processed
visual-only, auditory-only, and audiovisual words. e results revealed enhanced connectivity between the
auditory, visual, and premotor cortex during audiovisual speech processing compared to unimodal processing.
Furthermore, during visual-only speech processing, there was increased connectivity between the posterior
superior temporal sulcus (pSTS) and the primary visual cortex (V1), but not the primary auditory cortex (A1),
across most experimental conditions. e authors proposed that the pSTS region might be crucial in integrating
visual information with an existing auditory-based perception. ese ndings supported the earlier research
by14Zhu and Beauchamp (2017), who found that dierent regions of the pSTS preferred visually presented faces
with either moving mouths or moving eyes, with only the mouth-preferring regions exhibiting a strong response
to voices. However, what remains unclear and continues to be debated is the involvement of the premotor
cortex in speech perception across various paradigms, particularly in terms of lexicality and the modality of the
stimulus4,15.
Moreover, the presence of visual-related responses in the superior temporal cortex (STC) of individuals
who are deaf may be attributed to long-term auditory deprivation, such as the absence of auditory sensory
input. However, it could also be inuenced by other dynamic cognitive functions, such as the acquisition of sign
language12. Previous research has shown that the activity in the STC positively correlates with the duration of
deafness or the age at which cochlear implants were received1619 indicating that functional reorganization likely
occurs in the auditory cortex over an extended period. Systematic review and meta-analysis which discusses how
increased activation in the STC in response to visual speech leads to improved speech understanding revealed
that STC activation corresponds to the ability to read lips and understand speech rather than the duration of
sensory deprivation20,21. is suggests that the compensatory changes resulting from sensory deprivation do
not necessarily require a gradual integration of visual inputs into the STC. Instead, they are rapidly modulated
by preexisting connections from higher-level cortical areas associated with language processing. Hence, the
reorganization of the STC may involve contributions from both bottom-up signals (e.g., originating from the
visual cortex) and top-down modulation (e.g., originating from the frontal-temporal regions) to facilitate such
cross-modal activity22.
Research has provided evidence that extended training in lipreading can bring about structural and functional
changes in the brain regions involved in visual and auditory processing among procient lip readers9,23,24.
Furthermore, studies have demonstrated neuroplasticity related to lipreading in deaf individuals, who heavily
rely on lipreading, and exhibit heightened visual processing in brain areas typically associated with auditory
processing25,26. ese ndings contribute to our understanding of how lipreading supports speech perception
and have potential implications for rehabilitation strategies and the development of assistive technologies for
individuals with hearing impairments.
Previously, audiovisual integration was oen regarded as an “individual dierence” variable, unrelated to
unimodal processing abilities27,28. However, 29Tye-Murray et al. (2016) demonstrated that word recognition
scores for auditory-only and visual-only stimuli accurately predicted audiovisual speech perception performance
with no evidence of a distinct integrative ability factor. ese ndings may suggest that audiovisual speech
perception relies primarily on coordinating auditory and visual inputs. In summary, while signicant insights
have been gained into the neural mechanisms of lipreading and its overlap with auditory speech processing,
the specic involvement of the premotor cortex and how it varies by lexicality and stimulus modality during
lipreading remains poorly understood and debated.
What is more, gender appears to play an important role in lipreading, although ndings on sex dierences
have been inconsistent. Some studies suggest that women outperform men in this skill. For instance30, found that
women performed better than men in a lipreading task requiring participants to identify speech sounds solely
from visual cues31 reported higher lipreading accuracy for women when identifying sentences from visual cues
alone. However, other studies have found no signicant dierences in lipreading accuracy between men and
women32. In terms of neural mechanisms, there is evidence that women and men may engage dierent neural
pathways during lipreading. For example33,34, found that females exhibited greater activity in the le auditory
area while lipreading silently articulated numbers, despite similar recognition accuracy to males. is suggests
potential sex-based dierences in neural processing, even in the absence of behavioral dierences. Overall, the
literature is inconsistent, leaving the nature and causes of these dierences unclear. To account for this variability,
some studies have chosen to focus exclusively on one sex—predominantly females—to minimize between-sex
variability (e.g.)35.
Behavioral studies provide a valuable framework for gaining a deeper understanding of neurobiological
ndings. e context in which words and sentences are presented plays a signicant role in lipreading accuracy.
Compared to isolated sentences, lipreading accuracy is enhanced when sentences are presented within a
meaningful context, such as a story1,36. is means that lipreading relies on visual cues from the speaker’s lips as
well as contextual information. Factors such as the visibility of the speaker’s face and the distinctiveness of lips
movements also inuence lipreading accuracy37.
Furthermore, linguistic factors, including the complexity of words and sentences, can impact lipreading
accuracy38. Research has demonstrated a connection between lipreading ability and auditory perception, where
individuals with better lipreading skills tend to exhibit superior auditory perception skills, particularly in noisy
environments10,39. is relationship appears to stem from the integration of audiovisual information rather
Scientic Reports | (2025) 15:13253 2
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
than reliance on one modality over the other. Studies such as23 suggest that shared cognitive mechanisms like
attention and memory support both lipreading and auditory perception, enhancing speech comprehension in
noisy settings. Furthermore29, showed that performance on auditory-only and visual-only tasks independently
predicted audiovisual speech perception, indicating that lipreading complements rather than substitutes
auditory processing. ese ndings highlight the dynamic interplay between modalities, wherein lipreading
may augment auditory perception even in less challenging conditions, as demonstrated by the McGurk eect27.
Lipreading and auditory perception are intertwined and rely on shared cognitive processes such as: attention,
memory, integration of multisensory information, and language processing. Importantly, training programs
focusing on visual speech perception have been shown to enhance lipreading skills12,40, highlighting the potential
for improvement in this domain. ese ndings underscore the potential of lipreading training for rehabilitating
individuals with hearing loss or speech perception diculties. Firstly, however, it is essential to gain a deeper
understanding of the neurocognitive processes underlying this phenomenon, as well as the task-dependent and
subject-dependent variability.
Building upon these identied gaps in the literature, this study aims to elucidate the neural mechanisms
underlying lipreading within the Polish-speaking population, with a focus on distinguishing between visual-
only and audiovisual speech processing modalities. Our primary objective was to explore how the complexity
of linguistic material inuences the neural processing of lipreading, and how these processes dier when both
auditory and visual cues are present versus when only visual cues are available. We expected that for both
audiovisual and only for visual (lipreading condition) we would observe dierences in brain regions involved
in grammatical processing. e anterior temporal lobe (ATL) houses a lexicon of objects and events, vital for
identifying and abstracting common elements into new concepts. ese concepts, such as “green leaf,” illustrate
ATL’s role in semantic processing and conceptual integration. At the same time, the posterior parietal cortex
(PPC) serves as a critical hub for integrating sensory information and coordinating attentional resources during
speech processing and oral reading. For lipreading conditions we also assumed involvement of premotor cortex
(PMv) as it plays a crucial role in planning and executing the motor movements necessary for articulating speech.
It coordinates with areas like the posterior frontal eye elds (pFEF) and FEF, which are involved in controlling
visual attention and eye movements, respectively, during the visual processing of speech-related cues41. What
is more, we sought to examine potential dierences in lipreading ability and neural activation patterns between
male and female participants, thereby contributing to the understanding of sex-specic cognitive processing in
multimodal and unimodal communication contexts. We hypothesized that women would outperform men in
lipreading skills, both on subjective and objective measures. Furthermore, we anticipated that women would
exhibit a more specialized pattern of brain activation during the lipreading condition, specically in STC.
Methodology
Participants
Participants were recruited through social media. Out of 55 recruited participants, three were trained and
practicing language therapists, and one participant did not pay attention to the tasks at hand, and therefore were
excluded from further analysis. Aer exclusion, the sample consisted of 26 females and 25 males, aged 25.51
± 6.55. All participants were native Polish speakers, right-handed and reported normal hearing and normal or
corrected to normal (with contact lenses) vision and no psychiatric or neurological disabilities.
All participants signed informed consent forms and received monetary compensation for their time. e
study was approved by the research ethics committee of Nicolaus Copernicus University in Toruń and was
conducted following the Helsinki Declaration for Human Studies.
Lipreading comprehension test
Initially, participants watched a short video clip with sound, featuring an actress (trained speech therapist
specializing in working with the hearing impaired) narrating a 20-second story. is served to acquaint them
with the actress’s speech characteristics, such as speech rate and tone. Subsequently, we assessed each participant’s
lipreading ability through a behavioral task conducted before the fMRI examination. Aerwards, participants
viewed a dierent, silent, 44-second video clip of the same actress narrating a story on a specic topic (food),
which was known to them in advance. Aer watching the video, participants were provided with a printed list
of words and asked to identify those spoken by the actress. Points were awarded for correctly marked words and
deducted for incorrect ones. e highest achievable score was 21, while the lowest was − 21.
Additionally, aer each lipreading trial during the fMRI procedure, participant subjectively rated how much
she/he understood from the lipreading video, by choosing a score on the 7-point Likert scale (see Fig.1).
Lipreading fMRI procedure
During fMRI acquisition participants performed a lipreading task. e task consisted of various visual and
audiovisual materials spoken by the actress and a subsequent question about the comprehension of each
material. To explore the brains processing of visual lexical stimuli, we used three experimental conditions.
ese conditions included materials spoken by the same actress: (1) naturally together with sound (audiovisual
lexical); (2) naturally, but without sound (visual lexical); (3) a clip played backwards and without sound (visual
non-lexical). In addition, to investigate the role of the type of linguistic material on the lexical processing of
visual stimuli, each of the above conditions was implemented in the form of either narrative sentences or strings
of words. e narrative sentences had simple grammatical construction and were related to everyday life. All
the words were nouns, and were selected from the set of nouns occurring in the narrative sentences. Sample
experimental stimuli are available online. As a control condition, we used a still photo of the voice actress with
no acoustic stimulation. Consequently, we used six experimental conditions and one control condition. Each
trial of the task began with information (1 s) about the topic of upcoming language material and whether it
Scientic Reports | (2025) 15:13253 3
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
would be sentences or words. en a clip of the language material was displayed (20 s) in line with the described
experimental conditions. Aer the clip ended, a 7-point scale (4 s) appeared allowing participants to indicate
their comprehension level of the presented language material. Each trial ended with a xation cross (4 s).
Participants subjectively rated their comprehension of the presented material using a response pad held in their
right hand. Task was divided into six parts, lasting 4:03 min each, to conform with optimal fMRI sequence
length. In order to avoid participants’ confusion and for more robust fMRI data modeling, each of the six task
parts had only either words or sentences in an alternating order. e rst part always had only sentences, and
the second only words, and so on. e order of conditions inside each part of the task was semi-randomised to
avoid the same condition occurring twice in a row. Materials were related to: sport, weather, food or fashion.
e experimental protocol was controlled by the Presentation soware (Neurobehavioral Systems Inc.) e
stimuli were presented via a mirror mounted to the MR coil and displayed on a LCD screen (NordicNeuroLab
AS) inside the MRI room. Behavioral responses were collected using MR-compatible response pads (SmitsLab).
MRI acquisition
Neuroimaging was performed using a 3 T Siemens Prisma MRI scanner equipped with a 20-channel phased-
array RF head coil. Functional data for all tasks were acquired using a multi-band echo-planar-imaging sequence
(TR = 1500 ms, TE = 27 ms, ip angle = 90°, FOV = 192 × 192 mm, 94 × 94 mm image matrix, 48 transversal slices
of 2.4 mm slice thickness, voxel size of 2.0 × 2.0 × 2.4 mm, Slice Accel. Factor = 2, In-Plane Accel. Factor = 2, IPAT
= 4, TA = 4:03 min per run). Structural images were collected with a T1-weighted 3D MP-Rage sequence (TR
= 2300 ms, TE = 2.26 ms, TI = 900 ms, 8° p angle, FOV = 208 × 230 mm, image matrix 232 × 256 mm, voxel size
of 0.9 × 0.9 × 0.9 mm, 208 slices of 0.90 mm slice thickness, TA = 4:53 min).
Behavioral analysis
To test whether males and females dier in terms of lipreading skills, we ran the t-Student tests to compare
objective lipreading comprehension before neuroimaging as well as on the subjective comprehension levels
during the main lipreading task. Additionally, we run Pearson correlations to check the relation between
subjective and objective skill. All analysis was conducted and plotted using R42 with cut-o at p-value 0.05. All
scripts and data used for behavioral analysis are available here: https://osf.io/6k74t/.
Neuroimaging data preprocessing and analysis
Neuroimaging data was preprocessed using SPM1243. Functional data was spatially realigned to the mean image,
to which the structural image was then co-registered. Segmentation and normalization to the common MNI
space was performed based on high-resolution structural images with resampling to 1mm isometric voxels. e
obtained transformation parameters were applied to the functional volumes with resampling to 2mm isometric
voxels. e normalized functional images were spatially smoothed with a Gaussian kernel of full-width half-
maximum (FWHM) of 6mm, and 0.004 Hz high-pass ltered (time constant of 256 s).
Statistical modeling of fMRI data was performed in SPM12 using a general linear model. e period of
speech material (20 s) was modeled for each condition type, resulting in four regressors of interest (bi-modal,
Fig. 1. e lipreading task design. Each trial started with instruction whether full sentences or string of
words will be presented and the topic of the material (e.g., sport). Following the instruction, 20 s of speech
material was presented in one of three variants: (1) clip with sound (audiovisual lexical), (2) without sound
(visual lexical), (3) without sound and played backwards (non-lexical visual). Additionally, 25% of the time, a
static face of the voice actress without speech material (static-control) was presented instead. Aer the speech
material, an interactive scale was presented. Participants were instructed to indicate how well they understood
the speech material. Fixation cross was presented for 4s aer each trial.
Scientic Reports | (2025) 15:13253 4
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
lipreading lexical, lipreading non-lexical, static-control) per fMRI run, since in every run only either words or
sentences were presented. As a consequence, the rst run had four regressors related to words, the second run
had four regressors related to sentences and so on in an alternating fashion. Additionally, six head movement
parameters obtained from the realignment procedure were added to the model as nuance regressors for each
run. For each participant, contrasts between estimated parameters (β values) of conditions were performed by
subtraction.
For second-level group analysis, we performed a series of one-sample t-tests on the contrasts estimated
parameters. We conducted analyses in three domains. First, we tested the eects of lexical lipreading processing
separately for sentences and words. Second, we compared brain responses during lipreading of sentences and
words, separately for lexical lipreading and non-lexical lipreading conditions. ird, based on previous literature
highlighting possible sex dierences in lipreading ability, we compared all the above contrasts between all
male and female participants using a series of two-sample t-tests. All neuroimaging gures were plotted using
BrainNetView toolbox44.
Results
Behavioral results
Results showed that males and females did not dier in terms of objective lipreading skill, but they did dier in
terms of subjective lexical lipreading skill - females judge their lipreading comprehension level higher than males
(see Table1; Fig.2). All the statistics and means for objective and subjective skill are listed in Table1.
Additionally, objective lipreading comprehension levels was positively correlated, both for females and males
(r =.43; p <.001) as well as the dierence for lexical vs. non-lexical lipreading comprehension levels (r =.47;
p <.001; see Fig.3).
Neuroimaging results
Note that in the main texts of the manuscript we do not report tables with voxel-wise statistics. ey are reported
in the supplementary materials. Additionally, all of the reported results are available as unthresholded maps at
the neurovault repository. Figures with regions involved in audiovisual words and sentences processing (Figure
S1) and conjunction analysis for ‘visual lexical vs. face’ with ‘audiovisual lexical vs. face (Figure S2) can also be
found in supplementary materials.
Sentences conditions
When examining the processing of visual lexical sentences in comparison to static face images, increased
activation in areas associated with speech processing was noted, such as the bilateral middle and superior
temporal cortex. Additionally, stronger activity was observed in the bilateral frontal and middle superior frontal
Fig. 2. Sex dierences for subjective and objective lipreading comprehension levels.
Var i able Female (N = 26) Male (N = 25) Statistics
Age 29.25 27.85 t = 0.77, p =.447
Objective skill score 3.29 2.74 t = 0.59, p =.556
Subjective skill score for visual lexical 4.31 3.26 t = 3.38, p =.001
Subjective skill score for visual non-lexical 1.52 1.45 t = 0.49, p =.627
Tab le 1. Participants’ demographic and lipreading-related assessment.
Scientic Reports | (2025) 15:13253 5
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
areas, which encompasses supplementary motor areas (SMA). Furthermore, bilateral occipital cortex and
bilateral caudate also exhibited heightened activation in response to lexical sentence processing. Results from
these contrasts should be interpreted as control results, reecting the sensitivity of our paradigm (Fig.4, Table
S7 & S8).
When we compared activation during speech processing of audiovisual and visual lexical sentences, we found
that there was higher activation for audiovisual sentences in bilateral temporal and parietal areas. In bilateral
frontal, parietal (cuneus, PPC) and occipital areas we observed opposite pattern, i.e. higher activation for visual
lexical (Fig.4, Table S9 & S10).
Additionally, we checked which regions were involved in visual lexical processing during sentence reading
in comparison to non-lexical stimuli. We found that there were dierences in bilateral superior and middle
temporal gyrus (notably smaller in the right hemisphere) and in the le supplementary motor area. Visual non-
lexical sentences activated the right hemisphere more strongly and involved STG/planum temporale (PT) and
medial dorsolateral prefrontal cortex (Fig.4, Table S11& S12).
Words conditions
For words, as with sentences, we checked which regions are involved in visual lexical word processing in
comparison to static face image. Similarly, we found that there was higher activation in speech-related areas,
i.e., bilateral middle and superior temporal cortex, bilateral frontal and middle superior frontal areas (i.e. SMA),
bilateral occipital cortex and bilateral caudate. ose results should be interpreted as control results, reecting
the sensitivity of our paradigm (Fig.5, Table S13 & S14).
For visual compared to audiovisual lexical word processing, we observed higher activation in the bilateral
inferior and middle frontal, bilateral inferior and superior parietal and bilateral middle and inferior occipital
areas. Whereas for audiovisual word, we observed higher activation in bilateral superior and middle temporal
Fig. 4. Brain map activations for visual lexical sentences comparisons to static face condition (le), lexical
audiovisual condition (middle) and visual non-lexical condition. Contrast maps are thresholded at voxel-level
p <.001 and FWEcorrected (p <.05) for cluster size.
Fig. 3. Correlation between objective lipreading skill and subjective lexical lipreading comprehension levels,
and dierence for lexical vs. non-lexical lipreading comprehension levels.
Scientic Reports | (2025) 15:13253 6
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
areas, bilateral middle superior frontal gyrus, bilateral precuneus, bilateral lingual gyrus and superior occipital
gyrus (Fig.5, Table S15 & 16).
Lastly, we checked which areas were involved in the visual lexical word processing (vs. non-lexical words)
and we found areas of language network, i.e., bilateral SMA, bilateral middle frontal areas and le superior and
IFG, le middle and STG and le precentral gyrus. Similarly, to sentences, non-lexical words activated more
the right STG/PT and middle occipital gyrus (MOG), supramarginal and angular gyri with a small cluster in
fusiform gyrus (Fig.5, Table S17 & 18).
Visual conditions
Comparing brain activations during processing of visual lipreading of words and sentences, we observed higher
activation for sentences in bilateral precuneus, bilateral cingulate gyrus, bilateral middle frontal gyrus and le
inferior temporal gyrus. On the other hand, for visual words processing, we observed heightened activation
in bilateral occipital areas (including le fusiform), bilateral IFG, right cerebellum, right pre- and post-central
gyrus, and le STG (Fig.6, Table S3 & S4). Comparing the brain activity during processing of audiovisual
of words and sentences, we observed more extensive dierences but in the same areas as in the visual lexical
condition (see: Supplementary materials).
For similar comparison but without lexical meaning, we observed dierences in the same areas as in lexical,
but without bilateral medial superior frontal gyrus (Fig.6, Table S5 & S6).
Fig. 6. Brain map activations for visual lexical sentences vs. words comparisons (le) and visual lexical words
vs. sentences comparisons (right). Contrast maps are thresholded at voxel-level p <.001 and FWEcorrected
(p <.05) for cluster size.
Fig. 5. Brain map activations for lexical visual words comparisons to static face condition (le), lexical
audiovisual condition (middle) and visual non-lexical condition (right). Contrast maps are thresholded at
voxel-level p <.001 and FWEcorrected (p <.05) for cluster size.
Scientic Reports | (2025) 15:13253 7
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Additionally, comparing visual lexical sentences vs. lexical words to visual non-lexical sentences vs. non-
lexical words, one cluster of activity dierence was observed in the anterior cingulate cortex with the peak in x =
4, y = 30, z = 34, voxels = 158, t = −5.01.
As evident from our observations, the activation maps for both lexical and non-lexical comparisons displayed
notable similarities.
Sex dierences
We observed no dierences in brain activity between males and females for any of the contrasts of the lipreading
conditions.
Discussion
e aim of this study was to investigate the neural underpinnings of visual speech processing during lipreading.
To achieve this, we designed an fMRI-based speech comprehension task to examine three key aspects of speech
processing: (1) varying levels of semantic involvement (words vs. sentences), (2) lexicality of the speech material
(regular vs. backward-played), and (3) the modality of speech perception (with vs. without auditory input). Our
primary objective was to explore the neural mechanisms underlying lipreading, focusing on specic regions
including the anterior temporal lobe (ATL), posterior parietal cortex (PPC), and premotor cortex (PMC). We
hypothesized that these regions would show signicant activity during visual-only and audiovisual speech
processing, with the ATL and PPC associated with linguistic complexity and the PMC engaged during visual
lexical processing. Furthermore, we hypothesized heightened activity in the superior temporal cortex (STC) for
female participants, reecting potential sex-based dierences in neural processing. Below, we detail how the
observed results aligned with these expectations.
Neuronal activity patterns in both lexical and non-lexical comparisons during processing of words and
sentences showed some similarities in activation patterns. In turn, the dierentiating patterns suggest that non-
lexical stimuli do not activate (or activate less) frontal and temporal areas of language networks (Figs.4 and 5).
A le-lateralized activation pattern observed in the SFG and IFG, particularly the Brocas area, reinforces its
signicance in word processing, even in the absence of auditory input. Dierences (enhanced activity) in superior
temporal sulcus and middle temporal gyrus (MTG) related to facial expression during lexical lipreading suggest
that participants were actively engaged in phoneme and lexical encoding and also involved in the retrieval of the
semantic lexicon in line with:14.
Interestingly both variants of visual non-lexical stimuli - words and sentences, when compared to visual lexical
stimuli, elicited enhanced activation in the right hemisphere in the STG. e voiceless speech played backward
(non-lexical) consisted of detectable atypical eye gaze and speech-like lip movements that did not match the
expected linguistic code. Non-coherent and unexpected lip and eye movements may have triggered right pSTS
activity, known for its role in eye-gaze and facial expression comprehension45 and face-voice integration46 during
communication. is interpretation is also supported by the involvement of the medial dorsolateral prefrontal
cortex in response to non-lexical sentences. ese regions, known for their engagement in various cognitive
functions, including working memory and lexical retrieval47, appear to contribute signicantly to the complex
set of processes involved in speech recognition and non-verbal communication interpretation.
For visual lexical words, the involvement of the SMA, related to coordination of speech-related motor
movements, has been consistently implicated in language-related tasks48,49. A growing body of clinical
neurosurgical and neuropsychological data conrms the central role of SMA in speech production, including
initiation, automatization, and monitoring of motor execution. White matter tracts of degeneration connecting
the SMA to relevant cortical areas underlie symptoms of progressive apraxia and aphasia50. On the other hand,
clinical dysfunction of SMA does not aect language comprehension51. Although our initial hypothesis focused
on the PMC, the observed activation in the SMA aligns with our broader expectation that motor regions are
involved in visual speech processing. e SMA, as part of the motor network, may play a complementary
or overlapping role with the PMC in coordinating speech-related movements and analyzing visemes during
lipreading.
Our ndings suggest that motor aspects of speech may be especially important in visual speech comprehension
than in audiovisual speech comprehension. is seems to be particularly true for the task where visually
presented words appear in isolation and speech movements can be easily observed and analyzed via executive
motor nodes. is was not the case in visual sentence comprehension, during which it was more dicult to
extract and analyze visemes via the executive motor system and, consequently, lipreading was less eective.
When sentences and words were processed without voice, there were still observable dierences in brain
activation, though they were less extensive than with voice (Fig.6, Figure S1). e eects found in the anterior
and inferior temporal poles indicate a dierential role for semantic information retrieval52 in reading words and
sentences, likely due to the complexity and diculty of the linguistic material. Additionally, ATL plays a central
role in integrating semantic and syntactic information and is particularly sensitive to meaningful sentences53.
e observed activation of the ATL during visual sentence processing aligns with our hypothesis, supporting
its role in semantic integration and syntactic processing. In contrast, the temporoparietal junction’s dierential
involvement in reading words and sentences may be due to the high cognitive demands during sentence
recognition and the involvement of extensive attentional resources in analyzing lip movements.
Modality plays a crucial role in brain activation during language recognition. However, for without voice
conditions we observed stronger activation in the temporal, occipital and frontal areas than to static face
condition and in occipital and frontal in comparison to with voice condition. e role of the visual system in
lipreading is signicant from an early stage of processing54. As55Paulesu et al. (2003) summed up, the study
by56Campbell (1997) focused on patient L.M., who had bilateral lesions in the middle temporal area/V5 area.
is area, identied by57Zeki et al. (1991), plays a crucial role in visual motion perception. L.M. exhibited a
Scientic Reports | (2025) 15:13253 8
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
signicant impairment in lipreading and was notably less susceptible to the fusion illusion. e visual modality-
specic representations of speech have been supported by further studies for review, see58. Recent research
highlights the existence of a visuo-phonological mapping process during lipreading, which is additionally
supported by complex input from motor and premotor cortices and posterior temporal areas59. ese ndings
collectively suggest that the phonological analysis of speech is a multimodal process, incorporating both visual
cues (such as lip movements) and auditory information. is supports the notion that speech perception involves
the integration of visual and auditory elements in line with55.
From the results regarding brain activation during processing with and without auditory input, as we expected,
we posit that modality plays a pivotal role in brain activation during language recognition. Furthermore, when
information from all required inputs (auditory in this study) is lacking, the involvement of language-related
regions is stronger and covers larger areas, possibly reecting increased processing eort. Indeed, higher language
ability has been associated with both increases and decreases in neural activity, with considerable variation
regarding the brain areas involved. Additionally, a range of interpretations has been proposed to explain these
ndings60. Increased activity in areas of the cortical language network, such as the le angular gyrus, Broca’s area,
and the le temporal lobe, has been hypothesized to reect deeper semantic processing and greater sensitivity to
semantic relationships between sentences during comprehension tasks61,62. A similar eect can be found when
comparing brain activity during the comprehension of texts on familiar versus unfamiliar topics, which could
also be explained by deeper semantic processing of familiar than unfamiliar content63,64. Negative relationships
between brain activity and language ability have typically been interpreted as neural eciency65. is concept is
characterized by reduced brain activity in individuals with higher ability compared to those with lower ability,
despite equal or superior performance61. Other researchers have suggested automatization processes to explain
reduced neural activity in subjects with high language ability, as skilled readers engage in more automated and
ecient processing66. e neural engagement observed in response to various semantic stimuli, involving key
areas such as IFG/Broca, ATL, pSTS, pMTG, and the le STG, underscores the signicance of considering
visual speech reception as an inuential processing modality involved in language comprehension. is insight
contributes to a more comprehensive understanding of how linguistic information is perceived and interpreted
in the brain.
Our results also added one more puzzle to the discussion about sex dierences in lipreading skill and its
brain mechanisms. We did not nd any signicant dierences on behavioral and neurobiological level, which
is in contrast to:30 and in line with31 or67. While null results do not point to a lack of eect, in current study
more recent neuroimaging acquisition and processing techniques were used as well as the sample size was larger
than those in previous fMRI studies. It is therefore likely that the eects of sex dierences in neural processing
of speech reading are small. Moreover, there exists conicting information regarding sex dierences in visual
speech comprehension, likely stemming from the diverse range of protocols employed. ese protocols have
varied from syllable-based assessments to tests involving words and sentence comprehension68. In this study,
we explored straightforward words and sentences. Aligning with the hypothesis that women excel in speech-
reading continuous speech fragments, we anticipated that as task demands increased, sex dierences would
become more apparent69. Although we hypothesized heightened activity in the STC for female participants, no
signicant dierences were observed, suggesting that sex-related eects in neural processing of lipreading may
be subtle or inuenced by task complexity.
However, our behavioral results showed that while males and females do not dier in objective lipreading
skills, they do report dierences in subjective assessments of these skills. Cultural and societal expectations may
inuence individuals’ self-perception of their lipreading abilities. Stereotypes about sex roles and communication
skills might lead females to perceive themselves as more adept at tasks like lipreading, even when objective
measures do not support this distinction70,71. Additionally, dierences in communication styles or preferences
between sexes might explain why females feel more comfortable or eective in certain communication tasks, such
as lipreading, despite the lack of signicant objective dierences. It is important to note that these interpretations
are speculative. e observed dierences might also stem from the varying complexity of the tasks evaluated.
In the fMRI study, simpler sentences and words were used, which may suggest that females generally perform
better with simpler material (in line71. Conversely, more complex tasks, like the objective measures involving
longer narratives, might pose greater challenges, potentially explaining the lack of signicant dierences in
performance on these tasks. is aspect warrants further investigation to understand the underlying factors
more comprehensively.
Conclusions
Our results revealed key cortical areas involved in visual speech processing. Modality plays a pivotal role in
language recognition, inuencing neural engagement. We observed that the absence of auditory input led to
enhanced activation of language-related brain regions, indicating a heightened processing eort when relying
solely on visual cues. Notably, key areas such as the IFG/Broca, ATL, pSTS, pMTG, and the le middle and
STG were actively engaged, underscoring the importance of visual speech reception as a signicant modality
in language comprehension. e visual system’s signicant role in lipreading, as a multimodal process, was
emphasized.
our ndings also contribute to the discussion on sex dierences in lipreading skill, nding no signicant
dierences on behavioral and neurobiological levels, challenging previous research suggesting such dierences.
Subjective reading comprehension level varied between sexes, and perceived dierences in lipreading ability
may be more related to cultural and societal inuences rather than inherent neurological distinctions. Overall,
our study provides new insights into the neural mechanisms underlying visual-only lipreading and audiovisual
language perception and sheds light on the functional dierences between these two modes of speech perception.
ese ndings may have important implications for hearing loss rehabilitation, speech recognition technologies,
Scientic Reports | (2025) 15:13253 9
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
and cross-linguistic communication. ey highlight the need for further research to better understand the
neural and cognitive bases of lipreading.
In conclusion, our ndings shed light on neural processes in language comprehension, emphasizing modality,
voice impact, and cultural inuences. Implications include understanding language disorders, brain function,
and developing assistive technologies.
Limitations
Our study had several limitations that may have impacted the outcomes and their interpretation. First, for
behavioral measures of lipreading skill, we predominantly focused on higher-level comprehension, i.e., we
examined participants’ skill using only continuous text rather than isolated words or sentences, likely overlooking
critical aspects of lipreading at more basic levels. is oversight may have prevented us from capturing important
variations in lipreading abilities among participants who may or may not struggle with fundamental skills.
Another limitation was that we used random order of experimental conditions and for a few participants
we rstly performed tasks with auditory before attempting them without sound. Although this sequence was
unnoticed by most due to the overall diculty of the tasks, it could introduce variability in the results, decreasing
the power of fMRI analysis.
Moreover, we did not provide any lipreading training before the experiment to familiarize participants with
the specicity of lipreading. We did not investigate the linguistic capabilities of the individuals involved, which
might have inuenced their performance in lipreading tasks. Future studies should focus on those two aspects to
possibly reduce the variability of strategies used by participants and therefore decrease the variance in behavioral
and neurocognitive strategies used during lipreading.
Data availability
Behavioral data and code used for statistical analysis is available at OSF repository: ( h t t p s : / / c o l a b . r e s e a r c h . g o o g
l e . c o m / d r i v e / 1 n J W i W i s g W B _ U y u 4 B t 0 s d D J l d P D a e t e V w ? u s p = s h a r i n g ) . Unthresholded, group-level w h o l e - b r a i
n neuroimaging results maps are available at public repository Neurovault: (https://osf.io/6k74t/). Raw n e u r o i m
a g i n g data are not available due to the privacy regulations.
Received: 17 July 2024; Accepted: 8 April 2025
References
1. Erber, N. P. Interaction of audition and vision in the recognition of oral speech stimuli. J. Speech Hear. Res. 12 (2), 423–425 (1969).
2. Middelweerd, M. J. & Plomp, R. e eect of speechreading on the speech-reception threshold of sentences in noise. J. Acoust. Soc.
Am. 82 (6), 2145–2147 (1987).
3. Kuhl, P. K. & Meltzo, A. N. e bimodal perception of speech in infancy. Science 218, 1138–1141 (1982).
4. Peelle, J. E. et al. Increased connectivity among sensory and motor regions during visual and audiovisual speech perception. J.
Neurosci. 42 (3), 435–442 (2022).
5. Bernstein, L. E., Jordan, N., Auer, E. T. & Eberhardt, S. P. Lipreading: A review of its continuing importance for speech recognition
with an acquired hearing loss and possibilities for eective training. Am. J. Audiol., 1–17. (2022).
6. Calvert, G. A. Crossmodal processing in the human brain: insights from functional neuroimaging studies. Cereb. Cortex. 11 (12),
1110–1123 (2001).
7. Skipper, J. I., Van Wassenhove, V., Nusbaum, H. C. & Small, S. L. Hearing lips and seeing voices: how cortical areas supporting
speech production mediate audiovisual speech perception. Cereb. Cortex. 17 (10), 2387–2399 (2007).
8. Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R. & Williams, S. C. R. Activation of auditory cortex during silent
lipreading. Science 276 (5312), 593–596 (1997).
9. Campbell, R. et al. Cortical substrates for the perception of face actions: an fMRI study of the specicity of activation for seen
speech and for meaningless lower-face acts (gurning). Cogn. Brain. Res. 12 (2), 233–243 (2001).
10. Calvert, G. A., Campbell, R. & Brammer, M. J. Evidence from functional magnetic resonance imaging of crossmodal binding in the
human heteromodal cortex. Curr. Biol. 10 (11), 649–657 (2000).
11. Nath, A. R. & Beauchamp, M. S. Dynamic changes in superior Temporal sulcus connectivity during perception of noisy audiovisual
speech. J. Neurosci. 31 (5), 1704–1714 (2011).
12. Auer, E. T. Jr, Bernstein, L. E., Sungkarat, W. & Singh, M. Vibrotactile activation of the auditory cortices in deaf versus hearing
adults. Neuroreport 18 (7), 645–648 (2007).
13. MacSweeney, M., Capek, C. M., Campbell, R. & Woll, B. e signing brain: the neurobiology of sign Language. Trends Cogn. Sci.
12 (11), 432–440 (2008).
14. Zhu, L. L. & Beauchamp, M. S. Mouth and voice: a relationship between visual and auditory preference in the human superior
Temporal sulcus. J. Neurosci. 37 (10), 2697–2708 (2017).
15. Zou, T. et al. Dynamic causal modeling analysis reveals the modulation of motor cortex and integration in superior Temporal gyrus
during multisensory speech perception. Cogn. Neurodyn., 1–16. (2023).
16. Finney, E. M., Clementz, B. A., Hickok, G. & Dobkins, K. R. Visual stimuli activate auditory cortex in deaf subjects: evidence from
MEG. Neuroreport 14 (11), 1425–1427 (2003).
17. Cardin, V. et al. Dissociating cognitive and sensory neural plasticity in human superior Temporal cortex. Nat. Commun. 4 (1), 1473
(2013).
18. Lyness, C. R., Woll, B., Campbell, R. & Cardin, V. How does visual Language aect crossmodal plasticity and cochlear implant
success? Neurosci. Biobehavioral Reviews. 37 (10), 2621–2630 (2013).
19. Moreno, A., Limousin, F., Dehaene, S. & Pallier, C. Brain correlates of constituent structure in sign Language comprehension.
NeuroImage 167, 151–161 (2018).
20. Erickson, L. C. Examinations of audiovisual speech processes, the McGurk eect and the heteromodal superior temporal sulcus in the
human brain across numerous approaches (Doctoral dissertation, Georgetown University). (2016).
21. Gao, C. et al. Audiovisual integration in the human brain: a coordinate-based meta-analysis. Cereb. Cortex. 33 (9), 5574–5584
(2023).
22. Merabet, L. B. & Pascual-Leone, A. Neural reorganization following sensory loss: the opportunity of change. Nat. Rev. Neurosci. 11
(1), 44–52 (2010).
23. Bernstein, L. E., Auer Jr, E. T. & Moore, J. K. Convergence or association. Handbook of multisensory processes, 203–220. (2004).
Scientic Reports | (2025) 15:13253 10
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
24. Möttönen, R. et al. Perceiving identical sounds as speech or non-speech modulates activity in the le posterior superior Temporal
sulcus. Neuroimage 30 (2), 563–569 (2006).
25. Bavelier, D. & Neville, H. J. Cross-modal plasticity: where and how? Nat. Rev. Neurosci. 3 (6), 443–452 (2002).
26. Finney, E. M., Fine, I. & Dobkins, K. R. Visual stimuli activate auditory cortex in the deaf. Nat. Neurosci. 4 (12), 1171–1173 (2001).
27. Magnotti, J. F. & Beauchamp, M. S. e noisy encoding of disparity model of the McGurk eect. Psychon. Bull. Rev. 22, 701–709
(2015).
28. Basu Mallick, D., Magnotti, F., B eauchamp, S. & J., &, M Variability and stability in the McGurk eect: contributions of participants,
stimuli, time, and response type. Psychon. Bull. Rev. 22, 1299–1307 (2015).
29. Tye-Murray, N., Spehar, B., Myerson, J., Hale, S. & Sommers, M. Lipreading and audiovisual speech recognition across the adult
lifespan: implications for audiovisual integration. Psychol. Aging. 31 (4), 380 (2016).
30. MacLeod, A. & Summereld, Q. Quantifying the contribution of vision to speech perception in noise. Br. J. Audiol. 21 (2), 131–141
(1987).
31. Bosworth, R. G. & Dobkins, K. R. e eects of Spatial attention on motion processing in deaf signers, hearing signers, and hearing
nonsigners. Brain Cogn. 49 (1), 152–169 (2002).
32. Jones, J. A. & Callan, D. E. Brain activity during audiovisual speech perception: an fMRI study of the McGurk eect. Neuroreport
14 (8), 1129–1133 (2003).
33. Ruytjens, L., Albers, F., Van Dijk, P., Wit, H. & Willemsen, A. Neural responses to silent lipreading in normal hearing male and
female subjects. Eur. J. Neurosci. 24 (6), 1835–1844 (2006).
34. Ruytjens, L., Albers, F., Van Dijk, P., Wit, H. & Willemsen, A. Activation in primary auditory cortex during silent lipreading is
determined by sex. Audiol. Neurotology. 12 (6), 371–377 (2007).
35. Saalasti, S. et al. Lipreading a naturalistic narrative in a female population: neural characteristics shared with listening and reading.
Brain Behav., 13(2), e2869. (2023).
36. Sumby, W. H. & Pollack, I. Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. 26 (2), 212–215 (1954).
37. Sekiyama, K. & Tohkura, Y. I. Inter-language dierences in the inuence of visual cues in speech perception. J. Phonetics. 21 (4),
427–444 (1993).
38. Auer, E. T. Jr & Bernstein, L. E. Speechreading and the structure of the lexicon: computationally modeling the eects of reduced
phonetic distinctiveness on lexical uniqueness. J. Acoust. Soc. Am. 102 (6), 3704–3710 (1997).
39. Walden, B. E., Montgomery, A. A., Prosek, R. A. & Hawkins, D. B. Visual biasing of normal and impaired auditory speech
perception. J. Speech Lang. Hear. Res. 33 (1), 163–173 (1990).
40. Bernstein, L. E., Auer, E. T. Jr. & Takayanagi, S. Auditory speech detection and visual speech detection: eects of training and visual
cues. Speech Commun. 32 (1–2), 73–80 (2000).
41. Grosbras, M. H., Laird, A. R. & Paus, T. Cortical regions involved in eye movements, shis of attention, and gaze perception. Hum.
Brain. Mapp. 25 (1), 140–154 (2005).
42. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
(2021). https://www.R-project.org/
43. Ashburner, J. & Friston, K. J. Unied segmentation. Neuroimage 26 (3), 839–851 (2005).
44. Xia, M., Wang, J. & He, Y. BrainNet viewer: a network visualization tool for human brain connectomics. PloS One, 8(7), e68910.
(2013).
45. Engell, A. D. & Haxby, J. V. Facial expression and gaze-direction in human superior Temporal sulcus. Neuropsychologia 45 (14),
3234–3241 (2007).
46. Watson, R., Latinus, M., Charest, I., Crabbe, F. & Belin, P. People-selectivity, audiovisual integration and heteromodality in the
superior Temporal sulcus. Cortex 50, 125–136. https://doi.org/10.1016/j.cortex.2013.07.011 (2014).
47. Duncan, J. & Owen, A. M. Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends Neurosci.
23 (10), 475–483 (2000).
48. Hickok, G., Houde, J. & Rong, F. Sensorimotor integration in speech processing: computational basis and neural organization.
Neuron 69 (3), 407–422 (2011).
49. Hickok, G. e functional neuroanatomy of Language. Phys. Life. review6 (3), 121–143 (2009).
50. Carbo, A. V. et al. Tractography of supplementary motor area projections in progressive speech apraxia and aphasia. NeuroImage:
Clin. 34, 102999 (2022).
51. Pinson, H. et al. e supplementary motor area syndrome: a neurosurgical review. Neurosurg. Rev. 45, 81–90 (2022).
52. Binder, J. R. et al. Toward a brain-based componential semantic representation. Cognit. Neuropsychol. 33 (3–4), 130–174 (2016).
53. Visser, M., Embleton, K. V., Jeeries, E., Parker, G. J. & Ralph, M. L. e inferior, anterior Temporal lobes and semantic memory
claried: novel evidence from distortion-corrected fMRI. Neuropsychologia 48 (6), 1689–1696 (2010).
54. Putzar, L. et al. e neural basis of lip-reading capabilities is altered by early visual deprivation. Neuropsychologia 48 (7), 2158–2166
(2010).
55. Paulesu, E., Perani, D., Blasi, V., Silani, G., Borghese, N. A., De Giovanni, U.,… Fazio, F. (2003). A f unctional-anatomical model for
lipreading. Journal of neurophysiology, 90(3), 2005–2013.
56. Campbell, R. Read the Lips: Speculations on the. Relations of Language and ought: e View from Sign Language and Deaf
Children, 110. (1997).
57. Zeki, S. et al. A direct demonstration of functional specialization in human visual cortex. J. Neurosci. 11 (3), 641–649 (1991).
58. Bernstein, L. E. & Liebenthal, E. Neural pathways for visual speech perception. Front. NeuroSci. 8, 386 (2014).
59. Hauswald, A., Lithari, C., Collignon, O., Leonardelli, E. & Weisz, N. A visual cortical network for deriving phonological information
from intelligible lip movements. Curr. Biol. 28 (9), 1453–1459 (2018).
60. Weber, S., Hausmann, M., Kane, P. & Weis, S. e relationship between Language ability and brain activity across Language
processes and modalities. Neuropsychologia 146, 107536 (2020).
61. Prat, C. S., Mason, R. A. & Just, M. A. Individual dierences in the neural basis of causal inferencing. Brain Lang. 116 (1), 1–13
(2011).
62. Van Ettinger-Veenstra, H., McAllister, A., Lundberg, P., Karlsson, T. & Engström, M. Higher Language ability is related to angular
gyrus activation increase during semantic processing, independent of sentence incongruency. Front. Hum. Neurosci. 10, 110
(2016).
63. Buchweitz, A., Mason, R. A., Meschyan, G., Keller, T. A. & Just, M. A. Modulation of cortical activity during comprehension of
familiar and unfamiliar text topics in speed reading and speed listening. Brain Lang. 139, 49–57 (2014).
64. St George, M., Kutas, M., Martinez, A. & Sereno, M. I. Semantic integration in reading: engagement of the right hemisphere during
discourse processing. Brain 122 (7), 1317–1325 (1999).
65. Haier, R. J. et al. Regional glucose metabolic changes aer learning a complex visuospatial/motor task: a positron emission
tomographic study. Brain Res. 570 (1–2), 134–143 (1992).
66. Welcome, S. E. & Joanisse, M. F. Individual dierences in skilled adult readers reveal dissociable patterns of neural activity
associated with component processes of reading. Brain Lang. 120 (3), 360–371 (2012).
67. Irwin, J. R., Whalen, D. H. & Fowler, C. A. A sex dierence in visual inuence on heard speech. Percept. Psychophys. 68, 582–592
(2006).
68. Watson, C. S., Qiu, W. W., Chamberlain, M. M. & Li, X. Auditory and visual speech perception: conrmation of a modality-
independent source of individual dierences in speech recognition. J. Acoust. Soc. Am. 100 (2), 1153–1162 (1996).
Scientic Reports | (2025) 15:13253 11
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
69. Jaeger, J. J. et al. Sex dierences in brain regions activated by grammatical and reading tasks. Neuroreport 9 (12), 2803–2807 (1998).
70. Eagly, A. H. & Steen, V. J. Gender stereotypes stem from the distribution of women and men into social roles. J. Personal. Soc.
Psychol., 46(4) (1984).
71. Ceuleers, D. et al. e eects of age, gender and test stimuli on visual speech perception: a preliminary study. Folia Phoniatr. Et
Logopaedica. 74 (2), 131–140 (2022).
Acknowledgements
We would like to thank all the participants who participated in the current study. For help with recruitment and
data collection, we thank Marta Zbysińska and Julia Kołakowska. We would like to thank Maciej Nowicki, Anna
Skoczylas and Zuzanna Pankowska for their substantive support in preparing the training materials. is work
was funded by the Polish National Science Centre grant (2016/20/W/NZ4/00354).
Author contributions
JW: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Visualization, Writing -
Review & Editing; JB: Conceptualization, Formal Analysis, Methodology, Visualization, Writing - Original Dra
Preparation, Review & Editing; HC: Conceptualization, Writing - Review & Editing; AP: Conceptualization,
Writing - Review & Editing; TW: Conceptualization, Resources, Methodology, Writing - Review & Editing.
Declarations
Competing interests
JW provides consulting services to NordicNeuroLab AS, which manufactures some of the add-on equipment
used during data acquisition. All other authors declare no conicts of interest.
Additional information
Supplementary Information e online version contains supplementary material available at h t t p s : / / d o i . o r g / 1
0 . 1 0 3 8 / s 4 1 5 9 8 - 0 2 5 - 9 8 0 2 6 - 8 .
Correspondence and requests for materials should be addressed to J.B.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and
indicate if changes were made. e images or other third party material in this article are included in the article’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included
in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy
of this licence, visit http://creativecommons.org/licenses/by/4.0/.
© e Author(s) 2025
Scientic Reports | (2025) 15:13253 12
| https://doi.org/10.1038/s41598-025-98026-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The processing of speech information from various sensory modalities is crucial for human communication. Both left posterior superior temporal gyrus (pSTG) and motor cortex importantly involve in the multisensory speech perception. However, the dynamic integration of primary sensory regions to pSTG and the motor cortex remain unclear. Here, we implemented a behavioral experiment of classical McGurk effect paradigm and acquired the task functional magnetic resonance imaging (fMRI) data during synchronized audiovisual syllabic perception from 63 normal adults. We conducted dynamic causal modeling (DCM) analysis to explore the cross-modal interactions among the left pSTG, left precentral gyrus (PrG), left middle superior temporal gyrus (mSTG), and left fusiform gyrus (FuG). Bayesian model selection favored a winning model that included modulations of connections to PrG (mSTG → PrG, FuG → PrG), from PrG (PrG → mSTG, PrG → FuG), and to pSTG (mSTG → pSTG, FuG → pSTG). Moreover, the coupling strength of the above connections correlated with behavioral McGurk susceptibility. In addition, significant differences were found in the coupling strength of these connections between strong and weak McGurk perceivers. Strong perceivers modulated less inhibitory visual influence, allowed less excitatory auditory information flowing into PrG, but integrated more audiovisual information in pSTG. Taken together, our findings show that the PrG and pSTG interact dynamically with primary cortices during audiovisual speech, and support the motor cortex plays a specifically functional role in modulating the gain and salience between auditory and visual modalities.
Article
Full-text available
Introduction Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. Methods We used a data‐driven approach to identify brain areas underlying the lipreading of an 8‐min narrative with participants whose lipreading skills varied extensively (range 6–100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants’ brain activity during the whole narrative, within and between conditions, was estimated by a voxel‐wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. Results Inter‐subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher‐level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip‐reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. Conclusions Our new data illustrates that findings from prior studies using well‐controlled repetitive speech stimuli and stimulus‐driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.
Article
Full-text available
People can seamlessly integrate a vast array of information from what they see and hear in the noisy and uncertain world. However, the neural underpinnings of audiovisual integration continue to be a topic of debate. Using strict inclusion criteria, we performed an activation likelihood estimation meta-analysis on 121 neuroimaging experiments with a total of 2,092 participants. We found that audiovisual integration is linked with the coexistence of multiple integration sites, including early cortical, subcortical, and higher association areas. Although activity was consistently found within the superior temporal cortex, different portions of this cortical region were identified depending on the analytical contrast used, complexity of the stimuli, and modality within which attention was directed. The context-dependent neural activity related to audiovisual integration suggests a flexible rather than fixed neural pathway for audiovisual integration. Together, our findings highlight a flexible multiple pathways model for audiovisual integration, with superior temporal cortex as the central node in these neural assemblies.
Article
Full-text available
Progressive apraxia of speech (AOS) is a motor speech disorder affecting the ability to produce phonetically or prosodically normal speech. Progressive AOS can present in isolation or co-occur with agrammatic aphasia and is associated with degeneration of the supplementary motor area. We aimed to assess breakdowns in structural connectivity from the supplementary motor area in patients with any combination of progressive AOS and/or agrammatic aphasia to determine which supplementary motor area tracts are specifically related to these clinical symptoms. Eighty-four patients with progressive AOS or progressive agrammatic aphasia were recruited by the Neurodegenerative Research Group and underwent neurological, speech/language, and neuropsychological testing, as well as 3T diffusion magnetic resonance imaging. Of the 84 patients, 36 had apraxia of speech in isolation (primary progressive apraxia of speech, PPAOS), 40 had apraxia of speech and agrammatic aphasia (AOS-PAA), and eight had agrammatic aphasia in isolation (progressive agrammatic aphasia, PAA). Tractography was performed to identify 5 distinct tracts connecting to the supplementary motor area. Fractional anisotropy and mean diffusivity were assessed at 10 positions along the length of the tracts to construct tract profiles, and median profiles were calculated for each tract. In a case-control comparison, decreased fractional anisotropy and increased mean diffusivity were observed along the supplementary motor area commissural fibers in all three groups compared to controls. PPAOS also had abnormal diffusion in tracts from the supplementary motor area to the putamen, prefrontal cortex, Broca’s area (frontal aslant tract) and motor cortex, with greatest abnormalities observed closest to the supplementary motor area. The AOS-PAA group showed abnormalities in the same set of tracts, but with greater involvement of the supplementary motor area to prefrontal tract compared to PPAOS. PAA showed abnormalities in the left prefrontal and frontal aslant tracts compared to both other groups, with PAA showing greatest abnormalities furthest from the supplementary motor area. Severity of AOS correlated with tract metrics in the supplementary motor area commissural and motor cortex tracts. Severity of aphasia correlated with the frontal aslant and prefrontal tracts. These findings provide insight into how AOS and agrammatism are differentially related to disrupted diffusivity, with progressive AOS associated with abnormalities close to the supplementary motor area, and the frontal aslant and prefrontal tracts being particularly associated with agrammatic aphasia.
Article
Full-text available
Purpose The goal of this review article is to reinvigorate interest in lipreading and lipreading training for adults with acquired hearing loss. Most adults benefit from being able to see the talker when speech is degraded; however, the effect size is related to their lipreading ability, which is typically poor in adults who have experienced normal hearing through most of their lives. Lipreading training has been viewed as a possible avenue for rehabilitation of adults with an acquired hearing loss, but most training approaches have not been particularly successful. Here, we describe lipreading and theoretically motivated approaches to its training, as well as examples of successful training paradigms. We discuss some extensions to auditory-only (AO) and audiovisual (AV) speech recognition. Method Visual speech perception and word recognition are described. Traditional and contemporary views of training and perceptual learning are outlined. We focus on the roles of external and internal feedback and the training task in perceptual learning, and we describe results of lipreading training experiments. Results Lipreading is commonly characterized as limited to viseme perception. However, evidence demonstrates subvisemic perception of visual phonetic information. Lipreading words also relies on lexical constraints, not unlike auditory spoken word recognition. Lipreading has been shown to be difficult to improve through training, but under specific feedback and task conditions, training can be successful, and learning can generalize to untrained materials, including AV sentence stimuli in noise. The results on lipreading have implications for AO and AV training and for use of acoustically processed speech in face-to-face communication. Conclusion Given its importance for speech recognition with a hearing loss, we suggest that the research and clinical communities integrate lipreading in their efforts to improve speech recognition in adults with acquired hearing loss.
Article
Full-text available
In everyday conversation, we usually process the talker's face as well as the sound of their voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here we used fMRI to monitor brain activity while adult humans (n = 60) were presented with visual-only, auditory-only, and audiovisual words. The audiovisual words were presented in quiet and several signal-to-noise ratios. As expected, audiovisual speech perception recruited both auditory and visual cortex, with some evidence for increased recruitment of premotor cortex in some conditions (including in substantial background noise). We then investigated neural connectivity using psychophysiological interaction (PPI) analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. In addition to whole-brain analyses, we also conducted a region-of-interest analysis on the left posterior superior temporal sulcus (pSTS), implicated in many previous studies of audiovisual speech perception. We found evidence for both activity and effective connectivity in pSTS for visual-only and audiovisual speech, although these were not significant in whole-brain analyses. Taken together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech that complements activity in "integrative" brain regions like pSTS.Significance StatementIn everyday conversation, we usually process the talker's face as well as the sound of their voice. Access to visual speech information is particularly useful when the auditory signal is hard to understand (for example, to background noise). Prior work has suggested that specialized regions of the brain may play a critical role in integrating information from visual and auditory speech. Here we show a complementary mechanism relying on synchronized brain activity between sensory and motor regions may also play a critical role. These findings encourage reconceptualizing audiovisual integration in the context of coordinated network activity.
Article
Full-text available
The supplementary motor area (SMA) syndrome is a frequently encountered clinical phenomenon associated with surgery of the dorsomedial prefrontal lobe. The region has a known motor sequencing function and the dominant pre-SMA specifically is associated with more complex language functions; the SMA is furthermore incorporated in the negative motor network. The SMA has a rich interconnectivity with other cortical regions and subcortical structures using the frontal aslant tract (FAT) and the frontostriatal tract (FST). The development of the SMA syndrome is positively correlated with the extent of resection of the SMA region, especially its medial side. This may be due to interruption of the nearby callosal association fibres as the contralateral SMA has a particular important function in brain plasticity after SMA surgery. The syndrome is characterized by a profound decrease in interhemispheric connectivity of the motor network hubs. Clinical improvement is related to increasing connectivity between the contralateral SMA region and the ipsilateral motor hubs. Overall, most patients know a full recovery of the SMA syndrome, however a minority of patients might continue to suffer from mild motor and speech dysfunction. Rarely, no recovery of neurological function after SMA region resection is reported.
Chapter
A reference work for the emerging field of multisensory integration, covering multidisciplinary research that goes beyond the traditional "sense-by-sense" approach and recognizes that perception is fundamentally a multisensory experience. This landmark reference work brings together for the first time in one volume the most recent research from different areas of the emerging field of multisensory integration. After many years of using a modality-specific "sense-by-sense" approach, researchers across different disciplines in neuroscience and psychology now recognize that perception is fundamentally a multisensory experience. To understand how the brain synthesizes information from the different senses, we must study not only how information from each sensory modality is decoded but also how this information interacts with the sensory processing taking place within other sensory channels. The findings cited in The Handbook of Multisensory Processes suggest that there are broad underlying principles that govern this interaction, regardless of the specific senses involved. The book is organized thematically into eight sections; each of the 55 chapters presents a state-of-the-art review of its topic by leading researchers in the field. The key themes addressed include multisensory contributions to perception in humans; whether the sensory integration involved in speech perception is fundamentally different from other kinds of multisensory integration; multisensory processing in the midbrain and cortex in model species, including rat, cat, and monkey; behavioral consequences of multisensory integration; modern neuroimaging techniques, including EEG, PET, and fMRI, now being used to reveal the many sites of multisensory processing in the brain; multisensory processes that require postnatal sensory experience to emerge, with examples from multiple species; brain specialization and possible equivalence of brain regions; and clinical studies of such breakdowns of normal sensory integration as brain damage and synesthesia. Bradford Books imprint
Chapter
The relationship of language to cognition, especially in development, is an issue that has occupied philosophers, psychologists, and linguists for centuries. In recent years, the scientific study of sign languages and deaf individuals has greatly enhanced our understanding of deafness, language, and cognition. This Counterpoints volume considers the extent to which the use of sign language might affect the course and character of cognitive development, and presents a variety of viewpoints in this debate. This volume brings the language-thought discussion into a clearer focus, both theoretically and practically, by placing it in the context of children growing up deaf and the influences of having sign language as their primary form of communication. The discussion is also sharpened by having internationally recognized contributors, such as Patricia Siple, Diane Lillo-Martin, and Ruth Campbell, with specialties in varied areas, all converging on a common interest in which each has conducted empirical research. These contributors clarify and challenge the theoretical assumptions that have driven arguments in the language-thought debate for centuries. An introduction by the editors provides a historical overview of the issues as well as a review of empirical findings that have been offered in response to questions about language-thought relations in deaf children. The final chapters are structured in the form of "live" debate, in which each contributor is given the opportunity to respond to the other perspectives presented in this volume.
Article
Introduction: To the best of our knowledge, there is a lack of reliable, validated, and standardized (Dutch) measuring instruments to document visual speech perception in a structured way. This study aimed to: (1) evaluate the effects of age, gender, and the used word list on visual speech perception examined by a first version of the Dutch Test for (Audio-)Visual Speech Perception on word level (TAUVIS-words) and (2) assess the internal reliability of the TAUVIS-words. Methods: Thirty-nine normal-hearing adults divided into the following 3 age categories were included: (1) younger adults, age 18-39 years; (2) middle-aged adults, age 40-59 years; and (3) older adults, age >60 years. The TAUVIS-words consist of 4 word lists, i.e., 2 monosyllabic word lists (MS 1 and MS 2) and 2 polysyllabic word lists (PS 1 and PS 2). A first exploration of the effects of age, gender, and test stimuli (i.e., the used word list) on visual speech perception was conducted using the TAUVIS-words. A mixed-design analysis of variance (ANOVA) was conducted to analyze the results statistically. Lastly, the internal reliability of the TAUVIS-words was assessed by calculating the Chronbach α. Results: The results revealed a significant effect of the used list. More specifically, the score for MS 1 was significantly better compared to that for PS 2, and the score for PS 1 was significantly better compared to that for PS 2. Furthermore, a significant main effect of gender was found. Women scored significantly better compared to men. The effect of age was not significant. The TAUVIS-word lists were found to have good internal reliability. Conclusion: This study was a first exploration of the effects of age, gender, and test stimuli on visual speech perception using the TAUVIS-words. Further research is necessary to optimize and validate the TAUVIS-words, making use of a larger study sample.