ArticlePDF Available

Modality-independent recruitment of inferior frontal cortex during speech processing in human infants


Abstract and Figures

Despite increasing interest in the development of audiovisual speech perception in infancy, the underlying mechanisms and neural processes are still only poorly understood. In addition to regions in temporal cortex associated with speech processing and multimodal integration, such as superior temporal sulcus, left inferior frontal cortex (IFC) has been suggested to be critically involved in mapping information from different modalities during speech perception. To further illuminate the role of IFC during infant language learning and speech perception, the current study examined the processing of auditory, visual and audiovisual speech in 6-month-old infants using functional near-infrared spectroscopy (fNIRS). Our results revealed that infants recruit speech-sensitive regions in frontal cortex including IFC regardless of whether they processed unimodal or multimodal speech. We argue that IFC may play an important role in associating multimodal speech information during the early steps of language learning.
Content may be subject to copyright.
Contents lists available at ScienceDirect
Developmental Cognitive Neuroscience
journal homepage:
Modality-independent recruitment of inferior frontal cortex during speech
processing in human infants
Nicole Altvater-Mackensen
, Tobias Grossmann
Department of Psychology, Johannes-Gutenberg-University Mainz, Germany
Department of Psychology, University of Virginia, USA
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
Infant speech perception
Modality dierences
Inferior frontal cortex
Despite increasing interest in the development of audiovisual speech perception in infancy, the underlying
mechanisms and neural processes are still only poorly understood. In addition to regions in temporal cortex
associated with speech processing and multimodal integration, such as superior temporal sulcus, left inferior
frontal cortex (IFC) has been suggested to be critically involved in mapping information from dierent mod-
alities during speech perception. To further illuminate the role of IFC during infant language learning and speech
perception, the current study examined the processing of auditory, visual and audiovisual speech in 6-month-old
infants using functional near-infrared spectroscopy (fNIRS). Our results revealed that infants recruit speech-
sensitive regions in frontal cortex including IFC regardless of whether they processed unimodal or multimodal
speech. We argue that IFC may play an important role in associating multimodal speech information during the
early steps of language learning.
1. Introduction
Language processing starts long before infants utter their rst
words. Already in utero, babies start to extract rhythmic regularities of
the ambient language: newborns prefer to listen to speech with prosodic
characteristics of the native language and their crying matches their
native languages stress pattern (Mampe et al., 2009;Mehler et al.,
1988;Moon et al., 1993). With only little experience, infants dis-
criminate a range of speech sound contrasts (e.g., Eimas et al., 1971;
Werker and Tees, 1984) and match auditory and visual speech cues
(e.g., Kuhl and Meltzo, 1982;Patterson and Werker, 2003). They
considerably rene and extend this knowledge in the rst year of life
and attune perception to the characteristics of their native language.
This native language attunement is evidenced by changes in the percep-
tion of speech sounds, particularly by a reduced sensitivity to non-na-
tive sound contrasts and enhanced sensitivity to native sound contrasts
(for reviews, see Maurer and Werker, 2014;Jusczyk, 1998). However,
most work on the development of infant speech perception has focused
on the auditory domain even though large portions of the language
input to babies is conveyed through multimodal face-to-face commu-
Indeed, it has been argued that infants exploit visual and social cues
inherent in multimodal speech to facilitate language learning and
processing (e.g., Kuhl, 2007;Csibra and Gergeley, 2009). In a seminal
study, Kuhl and Meltzo(1982) showed that infants process cross-
modal speech information. In this study, when 18- to 20-weeks-old
infants were presented with two articulating faces side by side and an
auditory stream that matched one of the visual articulations, infants
preferred to look at the matching face. Subsequent work revealed that
despite this early sensitivity for multimodal speech cues, audiovisual
speech perception considerably develops during infancy and this de-
velopment further extends into childhood. Attunement to the native
language can also be observed for audio-visual speech. While young
infants are still sensitive to the match between auditory and visual
speech cues for both familiar and unfamiliar languages, towards the end
of the rst year of life, they lose sensitivity for the cross-modal seg-
mental match of non-native audiovisual speech (Pons et al., 2009;
Kubicek et al., 2014;Shaw et al., 2015). Similarly, 4- to 6-month-olds
discriminate their native from a non-native language when presented
with silent articulations, based on visual speech cues alone, whereas
monolingual 8-month-olds no longer do so (Weikum et al., 2007).
Nevertheless, visual cues continue to be exploited for phonetic learning
as they can be used to enhance sound discrimination in infants and
adults (Teinonen et al., 2008;Ter Schure et al., 2016;Mani and
Schneider, 2013; but see Danielson et al., 2017). Indeed, infants appear
to actively seek out visual speech information by increasing attention to
Received 27 July 2017; Received in revised form 25 August 2018; Accepted 25 October 2018
Corresponding author at: Department of Psychology, Johannes-Gutenberg-University Mainz, Binger Str. 14-16, 55122, Mainz, Germany.
E-mail address: (N. Altvater-Mackensen).
a speakers mouth during the phase of native language attunement
(Lewkowicz and Hansen-Tift, 2012;Tenenbaum et al., 2013), and such
attention has been associated with more mature speech processing
(Kushnerenko et al., 2013). Taken together, these studies demonstrate
that infants associate speech sound information from dierent mod-
alities, suggesting that phonemic representations are inherently multi-
modal (see also Bristow et al., 2008). They do not, however, necessarily
imply that infants automatically integrate information from audition
and vision during speech processing (see Shaw and Bortfeld, 2015).
Evidence for limited audiovisual integration early in development
comes from studies showing reduced sensitivity to the temporal syn-
chrony of auditory and visual speech streams in infants (e.g.,
Lewkowicz, 2010) and from studies nding considerable variation in
the likelihood to fuse auditory and visual speech cues into one percept
until well into childhood (Desjardins and Werker, 2004;McGurk and
MacDonald, 1976). Interestingly, the developing ability to map audi-
tory and visual speech cues seems to be modulated by articulatory
knowledge (Desjardins et al., 1997;Mugitani et al., 2008;Altvater-
Mackensen et al., 2016), which has been taken to suggest a role for
sensomotoric information in phonological learning and processing (for
a recent discussion see Guellai et al., 2014).
Despite this growing body of behavioural work, research on the
neural processing of audiovisual speech in infants is relatively scarce
and the neural underpinnings of infant audiovisual speech perception
are still only poorly understood. Most studies investigating the neural
correlates of infant speech perception focused on auditory speech per-
ception and activation patterns elicited in temporal brain areas in-
volved when processing auditory stimuli (for reviews, see Minagawa-
Kawai et al., 2011;Rossi et al., 2012). Only more recently, research has
also begun to explore the processing of (a) multimodal input and (b) the
recruitment of frontal brain regions during infant speech perception
(e.g., Altvater-Mackensen and Grossmann, 2016). The former is parti-
cularly relevant because most language input to infants is inherently
multimodal and as pointed out above this multimodality seems to be
reected in infantsphonemic representations. Understanding the pro-
cesses of audiovisual speech perception is thus crucial for an ecologi-
cally valid account of infant speech perception. The latter seems par-
ticularly interesting in the light of behavioural ndings suggesting a
link between articulatory and perceptual abilities. There is a long-
standing debate on the potential inuence of production processes on
speech perception and its link to frontal brain regions, specically the
(left) inferior frontal cortex (for reviews see Galantucci et al., 2006;
Poeppel and Monahan, 2011). Investigating the involvement of frontal
brain regions during speech processing in infants seems especially re-
levant as the perceptuo-motor link is thought to be rooted in early
language development: infants might use perception to guide and de-
velop their production (Hickok and Poeppel, 2007), they might use
their motor knowledge to interpret phonemic information in perception
(Pulvermüller and Fadiga, 2010), and/or they might form multimodal
representations by establishing a mapping between auditory and ar-
ticulatory information (Westermann and Miranda, 2004). The current
study builds on this line of work and investigates infantsprocessing of
unimodal and multimodal speech in frontal brain regions using func-
tional near-infrared spectroscopy (fNIRS) to add to our understanding
of the neural processes underlying infant speech perception.
Before we discuss previous research with infants in more detail, we
will briey summarise relevant ndings from the adult literature to put
the current study into context. Research with adults has shown that
visual speech processing (i.e., silent lip reading) recruits areas in au-
ditory cortex (Calvert et al., 1997;Sams et al., 1991). The response to
speech in sensory cortices, such as auditory cortex, and in areas asso-
ciated with multisensory processing, such as the superior temporal
sulcus (STS), is enhanced during audiovisual as compared to unimodal
speech perception (Calvert et al., 1999; see also Callan et al., 2001).
Further, Brocas area and left inferior prefrontal cortex seem to be
critically involved in the processing of audiovisual speech. Both regions
are activated more strongly in response to congruent as compared to
incongruent audiovisual speech (Ojanen et al., 2005; see also Skipper
et al., 2007). While activation in Brocas area is thought to reect the
mapping of auditory cues onto motor representations (e.g., Wilson
et al., 2004), activation in prefrontal cortex is thought to reect pro-
cesses of attention allocation (e.g., Miller and Cohen, 2001). In sum,
these studies show enhanced processing of audiovisual as compared to
unimodal speech in sensory areas, the STS and left inferior frontal and
prefrontal cortex. This provides a neural substrate to behavioural im-
provements in speech perception when a speakers mouth is visible
(e.g., Schwartz et al., 2004) and concurs with ndings that auditory
processing is aected by visual cues and vice versa (see also McGurk
and MacDonald, 1976). While temporal areas and specically the STS
have been demonstrated to be involved in binding and integrating in-
formation from dierent modalities (e.g., Beauchamp et al., 2004; but
see Hocking and Price, 2008), the role of frontal regions during speech
perception is more controversially discussed. In particular, enhanced
activation in left inferior frontal cortex has been taken to reect the
mapping of speech percepts onto motor schemes either through a
direct action-perception link (e.g., Pulvermüller and Fadiga, 2010) or in
the context of a predictive coding mechanism (e.g., Hickok et al., 2011).
Regardless of the proposed nature of sensomotoric mapping, the re-
ported frontal eects in adult speech perception are commonly assumed
to result from the association of speech cues from dierent modalities
during language development (see Dick et al., 2010).
Indeed, neuroimaging studies with infants show activation of left
inferior frontal regions in response to speech already in newborns and
3-month-old infants (Dehaene-Lambertz et al., 2002,2006;Peña et al.,
2003). Specically, over the course of the rst year of life, Brocas area
has been shown to become more prominently recruited during speech
processing, suggesting the establishment of a perceptuo-motor link for
speech categories during early language development (Imada et al.,
2006;Kuhl et al., 2014;Perani et al., 2011). Yet, only very few studies
explicitly tested infantsprocessing of multimodal speech cues. When
auditory speech is presented alongside a visual non-speech stimulus,
such as a checkerboard pattern, 2- to 4-month-olds show similar acti-
vation of sensory cortices for unimodal and multimodal stimuli (Taga
and Asakawa, 2007;Taga et al., 2003). This is thought to indicate that
early in ontogeny auditory and visual components are processed in
separate neural circuits with little cross-talk between sensory regions
(but see Watanabe et al., 2013). In contrast, 6- to 9-month-olds show
left-lateralised enhancement of activation in temporal areas in response
to auditory speech paired with a visual stimulus as compared to audi-
tory speech alone (Bortfeld et al., 2007,2009). Taken together, these
results suggest that enhanced processing of multimodal speech might
rely on language experience. However, visual cues in these studies were
non-linguistic. Thus, it is unclear if results generalise to the processing
of audiovisual speech. Yet, similarly enhanced processing for synchro-
nous multimodal as compared to unimodal or asynchronous speech has
been reported in ERP studies with 3- and 5-month-old infants (Hyde
et al., 2010,2011;Reynolds et al., 2014). Moreover, Fava et al. (2014)
report increased left-lateralised activation in response to audiovisual
native as compared to non-native speech in temporal brain regions by
the end of the rst year of life, providing evidence for a neural correlate
of perceptual native language attunement in the audiovisual domain
(for similar ndings with auditory-only speech see, e.g., Kuhl et al.,
While these studies suggest improved processing of native audio-
visual as compared to auditory-only and non-native speech, no study so
far directly compared the processing of auditory, visual and audiovisual
speech in infants. Furthermore, brain responses were only assessed for
sensory cortices and temporal regions. This concurs with well-estab-
lished ndings in the adult literature, highlighting the importance of
temporal regions in speech processing and specically the role of STS
for binding information from dierent modalities during speech per-
ception (e.g., Nath and Beauchamp, 2012;Baum et al., 2012). Yet, the
N. Altvater-Mackensen, T. Grossmann 
results cannot speak to the potential role of inferior frontal cortex (IFC)
in processing speech information from dierent modalities during
language development. Currently, only one study directly assessed in-
fantsrecruitment of frontal cortex during audiovisual speech percep-
tion (Altvater-Mackensen and Grossmann, 2016). This study reported
enhanced activation of regions in IFC in the left hemisphere in response
to congruent but not incongruent native audiovisual speech in 6-
month-old infants. Furthermore, results from this study show that in-
fantsresponse to audiovisual speech in inferior frontal brain regions is
impacted by their general attention to a speakers mouth during speech
perception. This nding is in line with the notion that left IFC is in-
volved in mapping information from dierent modalities during infant
speech perception.
To further illuminate the role of IFC during infant language learning
and speech perception, the current study examined infantsprocessing
of auditory, visual and audiovisual speech using fNIRS. In particular,
we investigated 6-month-oldsneural response to speech across mod-
alities at frontal and prefrontal sites in both hemispheres. Prefrontal
sites were included because prefrontal cortex has been suggested to be
involved in processes of attention control during audiovisual speech
perception in adults (Ojanen et al., 2005) and in processing of socially
but not necessarily linguistically relevant aspects of speech in infants
(Dehaene-Lambertz et al., 2010;Naoi et al., 2012). We hypothesised
that infants might dierentially recruit areas in IFC in response to
multimodal as compared to unimodal speech, reecting dierences in
sensory stimulation, attention and/or task demands (for discussion of
crossmodal, additive eects in adults see Calvert, 2001). This hypoth-
esis is also based on previous reports of enhanced temporal activation
for multimodal stimuli in infants (Bortfeld et al., 2007,2009). Indeed, if
areas in IFC are associated with integration processes during audio-
visual speech perception (e.g., Dehaene-Lambertz et al., 2002,2006),
activation should be stronger for multimodal as compared to unimodal
speech because only audiovisual speech input requires the evaluation
and integration of information from dierent modalities. However, it
has also been suggested that activation of areas in left IFC, such as
Brocas area, during auditory speech perception reect the mapping of
motor schemes onto speech percepts (e.g., Kuhl et al., 2014). This im-
plies that speech perception leads to the automatic retrieval and pro-
cessing of information from dierent domains, such as auditory, visual
and motoric information, regardless of the modality of the input (see
also Westermann and Miranda, 2004). That is, infants might retrieve
(visuo-) motoric information not only when they are presented with a
talking face but also when they are presented with auditory-only speech
(as suggested by motor theory of speech perception, for instance, see
Liberman, 1957). If so, processing demands should be similar across
modalities and we might thus nd similar patterns of activation for
unimodal and multimodal speech.
2. Method and materials
2.1. Participants
Twenty-eight German 5.5- to 6-month-olds (13 girls) from a
monolingual language environment participated in the experiment (age
range: 5;15 (months; days) to 6;0, mean age 5;24). Infants were re-
cruited via a large existing infant and child database at the Max Planck
Institute for Human Cognitive and Brain Sciences in Leipzig, Germany.
All infants were born full term with normal birth weight (> 2500 g)
and had no reported hearing or vision impairment. Eight additional
infants could not be tested because they started to cry, four additional
infants had to be excluded due to technical failure during fNIRS re-
cording, and three additional infants were later excluded from analysis
because they contributed less than 50% of valid data (see data ana-
lysis). Parents gave written informed consent to participate in the study
and received 7.50 Euro and a toy for their infant for participation. The
study was approved by the local ethics committee and conducted
according to the declaration of Helsinki.
2.2. Stimuli
Stimuli were adapted from a previous study testing infants' audio-
visual speech perception (Altvater-Mackensen and Grossmann, 2016).
Speech stimuli consisted of audiovisual recordings of a female native
speaker of German, uttering the vowels /a/, /e/ and /o/ in hyper-ar-
ticulated, infant-directed speech (for details on the visual and acoustic
characteristics see Altvater-Mackensen et al., 2016). For each vowel,
two stimulus videos were created that contained three successive re-
petitions of the respective vowel. Each utterance started and ended with
the mouth completely shut in neutral position. Each vowel articulation
was separated by approximately three seconds in which the woman
kept a friendly facial expression, leading to a video length of 15 s. The
eye-gaze was always directed towards the infant. All videos were
zoomed and cropped so that they only showed the woman's head
against a light-grey wall. Video frames were 1024 pixels wide and 1000
pixels high, resulting in a width of 27 cm and a height of 26 cm on
screen. For the auditory-only stimuli, the visual stream of the videos
was replaced by a blank (black) screen. For the visual-only stimuli, the
auditory stream of the videos was replaced by silence. Three additional
example trials were created using dierent recordings of the same
woman uttering each vowel twice in a block of three repetitions fol-
lowed by an engaging smile and raise of her eyebrows. Each example
stimulus had a length of approximately 45 s.
Non-speech stimuli were created to mimic the speech stimuli. Each
stimulus consisted of three successive repetitions of a complex sound,
accompanied by a time-locked visualisation. Non-speech sounds were
three dierent melodies that matched the three speech sounds in length
and volume and represented a bouncing ball, a ringing bell, and a
whistle. Non-speech visual stimuli were created using i-Tunesvi-
sualiser and showed visual objects (light bubbles) against a black
background that changed in colour and intensity corresponding to the
sound stimuli. Auditory and visual streams of the non-speech stimuli
were contingent, i.e. bubble explosions were time-locked to the sounds,
to mimic the synchrony between auditory and visual speech streams in
the speech stimuli. Timing of the sounds and video length of the non-
speech stimuli were matched to the vowel stimuli, leading to a stimulus
length of 15 s. Video frames were 1024 pixels wide and 1000 pixels
high, resulting in a width of 27 cm and a height of 26 cm on screen.
Again, auditory-only stimuli were created by replacing the visual
stream of the audiovisual videos with a blank (black) screen, and visual-
only stimuli were created by replacing the auditory stream of the
audiovisual videos with silence. Fig. 1 shows an example frame of the
mouth position for each of the fully articulated vowels and examples of
the soundsspectrograms (Fig. 1A) and an example frame of the visual
objects used in the non-speech stimuli with the spectrogram of the
corresponding sound (Fig. 1B).
2.3. Procedure
Infants were seated on their parent's lap in a quiet experimental
room, facing a 52 cm wide and 32.5 cm high TV screen at a distance of
60 cm from the screen. Visual stimuli were presented on screen.
Auditory stimuli were presented via loudspeakers that were located
behind the screen. Infants were rst presented with the three example
videos showing the woman uttering /a/, /e/ and /o/, to introduce in-
fants to the testing situation and to the speaker and her characteristics.
Infants were then presented with a maximum of 18 speech non-speech
sequences. Each sequence presented one of six dierent auditory, visual
and audiovisual speech videos (two videos per vowel and modality),
immediately followed by a modality-matched non-speech video. Speech
and non-speech videos were paired so that each specic speech video
was always followed by the same modality-matched non-speech video.
Sequences were pseudo-randomised so that no more than two
N. Altvater-Mackensen, T. Grossmann 
consecutive sequences belonged to the same modality, and so that no
more than two consecutive sequences contained the same vowel. Thus,
stimuli from the three dierent modalities were not presented in a
blocked design but intermixed throughout the experiment. Although
this might decrease the likelihood of task-dependent modality-specic
processing, pilot testing showed that an intermixed presentation in-
creased infantsinterest in the stimuli and considerably reduced
dropout rates. To ensure that infants looked at the screen, each se-
quence was started manually by the experimenter when the infant at-
tended to the screen, resulting in variable inter-stimulus intervals
(median = 8 s, range 160 seconds; note that this excludes longer
breaks which were occasionally taken to reposition or cheer up the
infant). On average, the experiment took approximately 15 min.
2.4. Data acquisition
A camera mounted below the screen recorded infants' behaviour
during the experiment to allow oine coding of attention and move-
ment throughout the experiment. The speech non-speech sequences
were presented using Presentation software (Neurobehavioral Systems),
and infantsfNIRS data were recorded using a NIRx Nirscout system and
NirStar acquisition software. The NIRS method relies on determination
of changes in oxygenated and deoxygenated haemoglobin concentra-
tion in cerebral cortex based on their dierent absorption spectra of
near-infrared light (for a detailed description see Lloyd-Fox et al.,
2010). Data were recorded from 16 source-detector pairs, placed at a
distance of approximately 2.5 cm within an elastic cap (Easycap) in
order to record brain responses from anterior and inferior frontal brain
regions. The source-detector arrangement resulted in a total of 49
channels, placed with reference to the 1020 system (see Fig. 2 for
details). To ensure comparable placement of channels irrespective of
infantshead size, several caps of dierent size were prepared with
optode holders. Based on an individual infants head size, the best t-
ting cap was used for testing. Data were recorded at a sampling rate of
6.25 Hz. Near-infrared lasers used two wavelengths at 760 nm and
850 nm with a power of 5 mW/wavelength. Light intensity was auto-
matically adjusted by the NIRS recording system to provide optimal
2.5. Data analysis
Infantsattention to speech and non-speech stimuli and their
movements during fNIRS recordings were coded oine from video. If
an infant looked away from the screen for more than 5 s, which is for
more than one third of a stimulus video, the data for this particular
stimulus were excluded from further analysis. If an infant showed
severe head movement during presentation of a stimulus video, which
resulted in movement artefacts in the data (based on visual inspection),
the data for this particular stimulus were also excluded. Three infants
were excluded from analysis because they did not contribute data for at
least 50% of the stimuli according to these criteria (for similar rejection
criteria, see Altvater-Mackensen and Grossmann, 2016). The nal
sample consisted of data from 28 infants that contributed on average
data from 15 speech sequences (range 918). Infantstended to look
away more often during auditory-only sequences and consequently
contributed fewer auditory trials to the analysis compared to audio-
visual trials (audiovisual: mean 5.36, SE 0.18, range 36; auditory:
mean 4.61, SE 1.1, range 26; visual: mean 5.04, SE 1.04, range 36;
audiovisual vs. auditory: t(27) = 2.938, p= .01; other ps > .05).
The fNIRS data were analysed using the Matlab-based software
nilab2 (see Grossmann et al., 2010, for previously published fNIRS data
using this analysis software). Data were ltered with a 0.2 Hz low-pass
lter to remove uctuations that were too fast and with a high-pass
lter of 30 s to remove changes that were too slow to be related to the
experimental stimuli. Using a 15 s time window (equalling the length of
each speech and non-speech sequence), measurements were converted
into oxygenated haemoglobin (oxyHb) and deoxygenated haemoglobin
(deoxyHb) concentrations using the modied Beer-Lambert law. We
then calculated changes in oxyHb and deoxyHb concentration in re-
sponse to speech relative to the non-speech baseline (for a similar
method applied to fNIRS data obtained from infants of similar ages, see
Grossmann et al., 2008,2010;Altvater-Mackensen and Grossmann,
2016). Note that we used modality-matched non-speech sequences as a
baseline rather than silence and a blank screen so that any (relative)
change in oxyHb and deoxyHb concentration can be interpreted as a
response to the speech stimuli rather than to sensory auditory, visual or
audiovisual stimulation per se (but see 4. Discussion for alternative in-
terpretations). Pilot testing further showed that a modality-matched
baseline considerably reduced infant movement and fussiness com-
pared to a baseline without stimulation (for the use of a similar base-
line, see Altvater-Mackensen and Grossmann, 2016). For subsequent
statistical analysis, we averaged the resulting concentration changes in
oxyHb and deoxyHb by participant for each channel. Note that we re-
port results on concentration changes in deoxyHb, but that it is not
unusual for studies with infants to nd no or inconsistent changes in
deoxyHb concentration in response to functional stimuli (cf. Lloyd-Fox
et al., 2010;Meek, 2002)
3. Results
Based on previous research (Altvater-Mackensen and Grossmann,
2016), we conducted one-sample t-tests on left- and right-hemispheric
Fig. 1. Stimulus examples. (A) Example frames for each of the fully articulated vowels used as speech stimuli and the corresponding sounds' spectrograms with pitch
contours outlined in blue. (B) Example frame for one of the exploding bubbles used as non-speech stimuli with the corresponding sound's spectrogram.
N. Altvater-Mackensen, T. Grossmann 
frontal channels to detect signicant increases in oxyHb in response to
speech (collapsed across modalities) to isolate channels of interest that
covered speech sensitive brain regions. This revealed two clusters of
speech-sensitive channels: four adjacent channels in left frontal regions
and seven adjacent channels in right prefrontal regions (all channels t
(27) 2.539, p.017, no corrections applied). To test for potential
lateralisation eects, we included the corresponding channels in the
opposite hemisphere for subsequent analysis. Our analysis was thus
conducted on the resulting two regions of interest (frontal and pre-
frontal) in each hemisphere (see Fig. 2).
For each region of interest,
relative concentration changes were averaged by participant and ex-
perimental condition across the relevant channels for further analysis.
According to NIRS channel placement with reference to the 1020
system and the resulting anatomical correspondences (Kabdebon et al.,
2014), the prefrontal regions mainly targeted the medial prefrontal
cortex whereas the frontal regions mainly targeted the inferior frontal
gyrus and lower parts of the middle frontal gyrus (corresponding in
placement approximately to F7/F8 for the channels located over the
inferior frontal gyrus and FP1/FP2 for the channels located over the
prefrontal cortex). Fig. 2 indicates the channel placement with the four
regions of interest and displays the time courses of the hemodynamic
responses for all channels included in the analysis.
A repeated-measures ANOVA on mean concentration changes in
oxyHb in response to speech with site (frontal, prefrontal), hemisphere
(left, right) and modality (audiovisual, auditory, visual) as within-
subject factors revealed an interaction between site and hemisphere (F
(1,27) = 8.175, p= .008, ƞ
= .232). No other interactions or main
eects reached signicance (ps .21). Separate repeated-measures
ANOVAs with hemisphere (left, right) and modality (audiovisual, au-
ditory, visual) as within-subject factors revealed a main eect of
hemisphere at frontal sites (F(1,27) = 5.135, p= .032, ƞ
= .160)
and prefrontal sites (F(1,27) = 6.205, p= .019, ƞ
= .187). No other
interactions or main eects reached signicance (ps .46).
Corresponding analysis on mean concentration changes in deoxyHb in
response to speech revealed no signicant main eects or interactions
at frontal sites, and a main eect of hemisphere (F(1,27) = 11.874, p=
.002, ƞ
= .305) and modality (F(1,26) = 4.055, p= .029, ƞ
.238), but no interaction at prefrontal sites.
Follow-up analysis showed signicant increases in oxyHb con-
centration for speech at both left (t(27) = 4.084, p.001, d= 0.77)
and right (t(27) = 2.498, p= .019, d= 0.47) frontal sites, and both
left (t(27) = 3.875, p= .001, d= 0.73) and right (t(27) = 7.365, p
.001, d= 1.39) prefrontal sites.
Concentration changes were stronger
Fig. 2. Channel placement and time course plots of the hemodynamic response to speech stimuli. Dots on the topographic head mark the placement of fNIRS
channels. Channels in frontal and prefrontal areas included in the regions of interest for analysis are coloured in green (frontal channels) and red (prefrontal
channels). The panels show the hemodynamic response to speech for all channels included in the analysis in the left (upper panels) and right (lower panels)
hemisphere. Note that not all depicted channels show signicant changes in oxyHb in response to speech (see 3. Results for details). The depicted frontal channels
correspond in placement approximately to F7/F8, targeting the inferior frontal gyrus and lower parts of the middle frontal gyrus (Kabdebon et al., 2014), while the
depicted prefrontal channels roughly correspond in placement to FP1/FP2, targeting the medial prefrontal cortex. The graphs plot the change in oxyHb (red line) and
deoxyHb (blue line) from the onset of the speech stimulus (averaged across all speech conditions) for 20 s, i.e. up to 5 s after speech stimulus oset.
The identied speech-sensitive channels spanned inferior and superior
frontal brain regions. Based on channel placement, we split the resulting
channels in inferior and superior clusters for initial analysis. Because there were
no main eects or interactions with respect to the anatomical inferior-superior
distinction (ps .09 for oxyHb and ps .07 for deoxyHb), we collapsed the
data across inferior and superior channels at frontal and prefrontal sites in each
hemisphere for analysis.
Note that results for the follow-up analysis and the planned comparisons
remain similar when controlling for false positives in multiple comparisons
through the Benajamini-Hochberg procedure with a false discovery rate of 0.05
(Benjamini and Hochberg, 1995). Controlling for the false discovery rate with
the Benjamini-Hochberg procedure seems more appropriate than controlling for
the familywise error rate with Bonferroni-corrections given the limited power
of our data set. Using the more conservative Bonferroni-correction the increase
in oxyHb for speech at right frontal sites and the dierence in oxyHb between
N. Altvater-Mackensen, T. Grossmann 
in the right than the left hemisphere for prefrontal sites (t(27) = -2.505,
p= .019, d= -0.47), while there was no signicant dierence between
hemispheres for frontal sites (p= .10, see Fig. 3.1). Corresponding
analysis showed a signicant decrease in deoxyHb concentration for
speech at prefrontal left sites (t(27) = -3.047, p= .005, d= -0.57), but
not at prefrontal right sites or at frontal sites in either hemisphere
(ps > .20).
Planned analysis on the inuence of modality showed signicant
increases in oxyHb concentration for audiovisual (t(27) = 2.037, p=
.052, d= 0.38), auditory (t(27) = 2.998, p= .006, d= 0.57) and vi-
sual (t(27) = 3.018, p= .005, d= 0.57) speech at frontal left sites and
for auditory (t(27) = 3.141, p= .004, d= 0.59) but not for audio-
visual or visual (ps .45) speech at frontal right sites (see Fig. 3.2).
There was no signicant dierence between modalities within each
hemisphere (ps .07) or between hemispheres for either modality (ps
.14). Corresponding analysis showed no signicant changes or dif-
ferences in deoxyHb concentration at frontal sites (ps > .07).
At prefrontal sites, oxyHb concentration signicantly increased for
audiovisual (t(27) = 3.611, p= .001, d= 0.68) and auditory (t
(27) = 3.248, p= .003, d= 0.61) but not for visual (p= .17)
speech in the left hemisphere and for audiovisual (t(27) = 5.579, p
.001, d= 1.05), auditory (t(27) = 5.159, p.001, d= 0.97) and vi-
sual (t(27) = 2.739, p= .011, d= 0.52) speech in the right hemi-
sphere (see Fig. 3.3). Again, concentration changes did not dier be-
tween hemispheres for either modality (ps .08) or between
modalities within each hemisphere (ps .18). Corresponding analysis
showed a signicant decrease in deoxyHb concentration for visual
speech at prefrontal left sites (t(27) = -3.570, p= .001, d= -0.67),
which was signicantly dierent from changes in response to audio-
visual (t(27) = -3.032, p= .005, d= -0.56) and auditory speech (t
(27) = 2.400, p= .024, d= -0.64) and signicantly stronger than in
the right hemisphere (t(27) = -2.711, p= .012, d= -0.56). No other
changes or dierences in deoxyHb concentration at prefrontal sites
reached signicance (ps > .07).
Fig. 3 displays mean concentration changes in oxyHb in response to
speech at frontal and prefrontal sites for both left and right hemisphere,
collapsed across modality (Fig. 3.1) and separated by modality (Fig. 3.2
and 3.3).
4. Discussion
The current study tested 6-month-old infantsneural response to
auditory, visual and audiovisual speech stimuli to assess modality-
speciceects in speech processing. Our results revealed the recruit-
ment of speech-sensitive regions in frontal and prefrontal cortex in both
hemispheres for uni- and multimodal speech.
Before we discuss results in more detail, it is important to emphasise
that we used a modality-matched baseline rather than silence and a
blank screen to assess changes in response to speech. This ensures that
the brain responses reported in the current study cannot be reduced to a
basic response to sensory auditory, visual or audiovisual stimulation,
but can be interpreted as a functional response to the speech input. One
might, however, argue that the observed response is not specically
related to speech processing but to face-voice processing more gen-
erally. From birth, infants prefer faces and voices over other kinds of
visual and auditory stimuli (Johnson et al., 1991;Vouloumanos et al.,
2010). Thus, the salience of the speakers face/voice in our speech
stimuli might by itself increase attention and impact processing. Since
we did not include a control condition using facial non-speech move-
ments (such as gurns, e.g., Calvert et al., 1997) and non-speech vocal
sounds (such as grunting), we cannot rule out this possibility. An ex-
periment including such control conditions for all modalities would
have been too long to run with 6-month-old infants. Nevertheless, it
would be important for future studies to directly compare speech and
non-speech conditions that both involve facial and vocal stimuli. Pre-
vious ndings can indeed be taken to suggest that the distinction be-
tween speech and non-speech facial movements and articulatory ges-
tures is not clear-cut early in infancy. First, infants do not only attune
their speech perception to the ambient input in the rst year of life but
also their face perception (Maurer and Werker, 2014). The impact of
face perception on audiovisual speech processing might thus change
over the course of the rst year of life. Second, infants are initially able
to match auditory and visual cues not only for human speech but also
for human non-speech sounds (Mugitani et al., 2008) and for monkey
calls (Lewkowicz and Ghazanfar, 2006), suggesting a broad ability to
match multimodal information. Nevertheless, we argue that the ob-
served frontal activation is functional to speech for several reasons.
Adult studies found stronger activation of IFC in response to visual
speech as compared to facial non-speech movements (Calvert et al.,
1997;Campbell et al., 2001;Hall et al., 2005), suggesting a response
that is specic to speech rather than to face-voice processing more
generally. Furthermore, adultsactivation of IFC during processing of
visual-only or degraded audiovisual speech is modulated by individual
dierences in speech reading and learning abilities (Paulescu et al.,
2003;Eisner et al., 2010;McGettigan et al., 2012). This is in line with
the notion that IFC activation is modulated by linguistic task demands.
Fig. 3. Mean change in oxyHb concentration in frontal and prefrontal brain regions in response to speech. The bar graphs illustrate dierences in concentration
changes found at frontal and prefrontal sites for speech collapsed across modalities ((1), left panel), and at frontal ((2), middle panel) and prefrontal sites ((3), right
panel) depending on modality with AV = audiovisual speech (solid bars), A = auditory speech (striped bars), and V = visual speech (dotted bars); error bars indicate
+/- 1 SE, asterisks indicate a signicance level of ** p.01, * p.02,
p = .05.
(footnote continued)
prefrontal right and left sides are no longer signicant in the follow-up analysis
(adjusted p= 0.11), and only increases in oxyHb for audiovisual speech at left
prefrontal sites and for audiovisual and auditory speech at right prefrontal sites
remain signicant in the planned comparisons (adjusted p= 0.003).
N. Altvater-Mackensen, T. Grossmann 
Of course, ndings with adults cannot be generalised to infants. Yet,
previous research with infants shows dierential activation of IFC for
matching and mismatching audiovisual speech with the same age group
and similar stimuli as used in the current study (Altvater-Mackensen
and Grossmann, 2016). This suggests that IFC is involved in mapping
speech information from dierent modalities in infants and that the
current experimental design taps into speech processing. This notion is
further supported by the nding that in this prior study IFC activation
in response to audiovisual speech stimuli correlated with infantsat-
tention to the speakers mouth (as assessed through eye tracking pre-
ceding the fNIRS recording; Altvater-Mackensen and Grossmann,
2016). In addition, infantsbehavioural response to the same matching
and mismatching speech stimuli has been shown to be modulated by
infantsarticulatory knowledge (Altvater-Mackensen et al., 2016),
pointing to a potential role for production processes which might be
modulated by the IFC in speech perception at this age and for these
To summarise the main nding, six-month-olds were found to re-
cruit frontal brain regions during processing of auditory-only, visual-
only and audiovisual speech. Increased activation of regions in frontal
cortex in response to speech was neither signicantly modulated by
modality nor were there signicant dierences in activation across
The nding that speech processing was not left-later-
alised contrasts with our earlier ndings on the processing of congruent
compared to incongruent audiovisual speech in 6-month-olds (Altvater-
Mackensen and Grossmann, 2016). Yet, there is considerable individual
variation in the recruitment of IFC during infantsaudiovisual speech
perception (Altvater-Mackensen and Grossmann, 2016; see also Imada
et al., 2006) and even though speech processing is biased to the left
hemisphere from early on in infancy, it has been shown to become more
strongly lateralised over the rst year of life (Minagawa-Kawai et al.,
2011). In general, our results replicate previous ndings, demonstrating
that infants recruit areas in IFC during speech perception (Dehaene-
Lambertz et al., 2002;Imada et al., 2006;Kuhl et al., 2014;Peña et al.,
2003;Perani et al., 2011) and support the notion that IFC plays a cri-
tical role for language learning and processing from early in ontogeny.
Interestingly, we did not nd systematic dierences in the activation of
frontal brain regions during speech processing with respect to modality.
This is unexpected given that speech processing in adults is modulated
by the congruency of audiovisual speech as well as by language mod-
ality (e.g., Calvert et al., 1999;Ojanen et al., 2005). Arguably, mod-
ality-specic processing might have been weakened by the fact that we
presented auditory, visual and audiovisual stimuli in random order
rather than in a blocked design. It would therefore be interesting for
future studies to assess if modality-speciceects in infants are
modulated by the specics of stimulus presentation. However, given
that there were no consistent dierences across modalities, we take our
results to suggest that IFC is recruited during uni- and multimodal
speech processing in infants.
There are dierent possible interpretations of the results with re-
spect to the functional role that the IFC plays during infant speech
perception. In the adult literature, activation of IFC during speech
perception has been found to be modulated by task demands, speci-
cally by the congruence of audiovisual speech information (Ojanen
et al., 2005), by stimulus clarity (McGettigan et al., 2012) and by sti-
mulus complexity (syllables vs. words vs. sentences; Peelle, 2012). This
is in line with the notion that IFC activation is associated with top-down
processes related to attentional control and memory (Song et al., 2015;
Friederici, 2002). On a more specic level, activation of the IFC has
been taken to suggest activation of motor schemes in the service of
speech perception.
However, theoretical accounts fundamentally dier
in their conception of this perceptuo-motor link. Models taking an
embodied approach to speech perception assume that the incoming
speech signal is analysed in terms of the associated articulatory in-
formation (e.g., motor theory of speech perception, Liberman, 1957;
and mirror neuron approaches, Pulvermüller and Fadiga, 2010). Ac-
cording to such models, phonemic information is represented in terms
of motor schemes and the motor system is critical for speech perception
(for reviews see Galantucci et al., 2006;Cappa and Pulvermüller,
2012). Other accounts assume that the motor system is activated during
speech perception to inform production, i.e., to provide corrective
feedback and to guide speech gestures (e.g., Hickok et al., 2011, for
auditory speech perception; Venezia et al., 2016, for visual speech
perception). In this view, the perceptuo-motor link results from the
need to tune production to the native sound system in early develop-
ment. This link may, however, be exploited to predict others upcoming
speech (see also Scott et al., 2009) and to limit potential interpretations
of the speech signal (see also Skipper et al., 2007). Similar interpreta-
tions of the perceptuo-motor link in terms of phonological learning
(Perani et al., 2011) and phonological analysis (Kuhl et al., 2014) can
be found in the infant literature.
In the light of this discussion, it is interesting to note that there is
increasing behavioural evidence that articulatory information mod-
ulates infantsspeech perception. First, research suggests that articu-
latory knowledge acts as perceptual lter, focusing attention to sound
contrasts relevant to concurrent phonological learning in production
(Vihman, 1996;Majorano et al., 2014). The motor system might thus
exert a top-down inuence to guide productive development. Second,
articulatory knowledge correlates with infantsability to map auditory
and visual speech cues during audiovisual speech perception
(Desjardins et al., 1997;Altvater-Mackensen et al., 2016). This might
suggest that infants recruit the motor system during speech perception
to retrieve articulatory information that can be used to predict sensory
outcomes of (visuo-motoric) mouth movements. Third, concurrent
sensomotoric information aects infantsauditory and audiovisual
speech perception: infants fail to discriminate sound contrasts that are
associated with dierent tongue tip positions when the tongue is
blocked by a pacier (Yeung and Werker, 2013;Bruderer et al., 2015).
This might indicate that infants use articulatory information to inter-
pret the incoming speech signal, i.e., that the motor system is recruited
to analyse speech information. Our data do not allow us to disentangle
these positions. For future studies it will be important to directly assess
to what extent infantsrecruitment of IFC during speech perception is
related to productive development in babbling and to (silent) imitation
processes, for instance by measuring concurrent fascial muscle activity
in order to investigate the contribution of articulatory information to
infant speech perception.
Given the similar neural response to auditory, visual and audio-
visual speech, we take our ndings to support the view that infant
speech processing involves retrieval and mapping of phonological in-
formation from dierent domains. In particular, our ndings are in line
with models assuming that phonological representations are inherently
multimodal and reect the association of auditory, visual and motor
information in the course of early language development (Westermann
and Miranda, 2004). According to this model, hearing or seeing a
speech stimulus leads to automatic (co-)activation and retrieval of
multimodal phonological information irrespective of the specic mod-
ality of the input stimulus itself. In combination with previous ndings
on the recruitment of the IFC during language processing in infants
Note that visual inspection of the data suggests that the neural response to
audiovisual and visual speech might be attenuated in the right compared to the
left hemisphere in frontal regions (cf. Fig. 3.2). This dierence was, however,
not signicant in direct planned comparisons.
It should be noted that inferior frontal cortex and specically Brocas area
have mainly been associated with syntactic processing and higher-level uni-
cation processes (e.g. Hagoort, 2014;Friederici, 2011). Yet, this is not directly
relevant to our study given infantslimited syntactic capabilities and the simple
non-referential, syllabic structure of our stimuli.
N. Altvater-Mackensen, T. Grossmann 
(Altvater-Mackensen and Grossmann, 2016;Imada et al., 2006;Kuhl
et al., 2014), we take our data to suggest that the IFC is pivotal in the
learning and processing of such multimodal phonological representa-
In addition to activation in IFC, our results show increased activa-
tion of right and left prefrontal sites in response to speech with stronger
eects in the right hemisphere. Again, we found no consistent dier-
ences in neural responses with respect to modality. This suggests that
information from the face and the voice elicit similar patterns of acti-
vation in infantsprefrontal cortex. These ndings are in agreement
with previous reports of right-lateralised activation of prefrontal cortex
in response to socially relevant stimuli, such as speech (for a review see
Grossmann, 2013,2015), and speak to theories that assign a central
role to social information in infantslanguage learning and processing
(e.g., social gating hypothesis,Kuhl, 2007). Prefrontal cortex might thus
serve to evaluate the social relevance of the perceived speech input and
to modulate attention to speech more generally. Such a mechanism of
relevance evaluation and attention control with respect to language
input is in line with theoretical proposals that view language develop-
ment as an inherently social process (e.g., Kuhl, 2007) and may relate to
behavioural ndings showing that social information fosters language
learning (e.g., Kuhl et al., 2003;Goldstein et al., 2003).
To conclude, we tested 6-month-old infantsneural response to
auditory, visual and audiovisual speech stimuli using fNIRS. Our results
show that infants recruit areas in frontal and prefrontal cortex during
speech perception of unimodal and multimodal speech. In combination
with previous ndings, we take our data to indicate that frontal and
prefrontal cortex play a critical role in language learning and proces-
sing. In particular, we suggest that inferior frontal cortex is involved in
the learning and processing of multimodal speech information and that
prefrontal cortex serves to evaluate the signicance of the speech input.
We thank Caterina Böttcher for her help with data collection and
coding. We also thank all families who participanted in this study. This
work was supported by funding awarded by the Max Planck Society (to
Altvater-Mackensen, N., Grossmann, T., 2016. The role of left inferior frontal cortex
during audiovisual speech perception in infants. NeuroImage 133, 1420.
Altvater-Mackensen, N., Mani, N., Grossmann, T., 2016. Audiovisual speech perception in
infancy: the inuence of vowel identity and productive abilities on infantssensitivity
to (mis)matches between auditory and visual speech cues. Dev. Psychol. 52, 191204.
Baum, S., Martin, R., Hamilton, C., Beauchamp, M., 2012. Multisensory speech perception
withouth the left superior temporal sulcus. NeuroImage 62, 18251832.
Beauchamp, M., Argall, B., Bodurka, J., Duyn, J., Martin, A., 2004. Unraveling multi-
sensory integration: patchy organization within human STS multisensory cortex. Nat.
Neurosci. 7, 11901192.
Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and
powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289300.
Bortfeld, H., Wruck, E., Boas, D.A., 2007. Assessing infantscortical response to speech
using near-infrared spectroscopy. NeuroImage 34, 407415.
Bortfeld, H., Fava, E., Boas, D.A., 2009. Identifying cortical lateralization of speech
processing in infants using near-infrared spectroscopy. Dev. Neuropsychol. 34,
Bristow, D., Dehaene-Lambertz, G., Mattout, J., Soares, C., Gliga, T., Baillet, S., Mangin,
J.-F., 2008. Hearing faces: how the infant brain matches the face it sees with the
speech it hears. J. Cogn. Neurosci. 21, 905921.
Bruderer, A., Danielson, D., Kandhadai, P., Werker, J., 2015. Sensorimotor inuences on
speech perception in infancy. Proc. Natl. Acad. Sci. 112, 1353113536.
Callan, D.E., Callan, A.M., Kroos, C., Vatikiotis-Bateson, E., 2001. Multimodal contribu-
tion to speech perception revealed by independent component analysis: a single-
sweep EEG case study. Cogn. Brain Res. 10, 349353.
Calvert, G.A., et al., 1997. Activation of auditory cortex during silent lip reading. Science
276, 593596.
Calvert, G.A., 2001. Crossmodal processing in the human brain: insights from functional
neuroimaging studies. Cereb. Cortex 11, 11101123.
Calvert, G.A., Brammer, M., Bullmore, E., Campbell, R., Iversen, S.D., David, A., 1999.
Response amplication in sensory-specic cortices during crossmodal binding.
NeuroReport 10, 26192623.
Campbell, R., et al., 2001. Cortical substrates for the perception of face actions: an fMRI
study of the specicity of activation for seen speech and for meaningless lower-face
acts (gurning). Cogn. Brain Res. 12, 233243.
Cappa, S., Pulvermüller, F., 2012. Language and the motor system. Cortex 48, 785787.
Csibra, G., Gergeley, G., 2009. Natural pedagogy. Trends Cognit. Sci. 13, 148153.
Danielson, D., Bruderer, A., Khandhadai, P., Vatikiotis-Bateson, E., Werker, J., 2017. The
organization and reorganization of audiovisual speech perception in the rst year of
life. Cogn. Dev. 42, 3748.
Dehaene-Lambertz, G., Dehaene, S., Hertz-Pannier, L., 2002. Functional Neuroimaging of
speech perception in infants. Science 298, 20132015.
Dehaene-Lambertz, G., Hertz-Pannier, L., Dubois, J., Merlaux, S., Roche, A., Sigman, M.,
Dehaene, S., 2006. Functional organization of perisylvian activation during pre-
sentation of sentences in preverbal infants. Proc. Natl. Acad. Sci. 103, 1424014245.
Dehaene-Lambertz, G., Montavont, A., Jobert, A., Allirol, L., Dubois, J., Hertz-Pannier, L.,
Dehaene, S., 2010. Language or music, mother or Mozart? Structural environmental
inuences on infantslanguage networks. Brain Lang. 114, 5365.
Desjardins, R., Rogers, J., Werker, J., 1997. An exploration of why pre-schoolers perform
dierently than do adults in audiovisual speech perception tasks. J. Exp. Child
Psychol. 66, 85110.
Desjardins, R., Werker, J., 2004. Is the integration of heard and seen speech mandatory
for infants? Dev. Psychobiol. 45, 187203.
Dick, A., Solodkin, A., Small, S., 2010. Neural development of networks for audiovisual
speech comprehension. Brain Lang. 114, 101114.
Eimas, P., Siqueland, E., Jusczyk, P., Vigorito, J., 1971. Speech perception in infants.
Science 209, 11401141.
Eisner, F., McGettigan, C., Faulkner, A., Rosen, S., Scott, S., 2010. Inferior frontal gyrus
activation predicts individual dierences in perceptual learning of cochlear-implant
simulations. J. Neurosci. 30, 71797186.
Fava, E., Hull, R., Bortfeld, H., 2014. Dissociating cortical activity during processing of
native and non-native audiovisual speech from early to late infancy. Brain Sci. 4,
Friederici, A., 2002. Towards a neural basis of auditory sentence processing. Trends
Cognit. Sci. 6, 7884.
Friederici, A., 2011. The brain basis of language processing: from structure to function.
Physiol. Rev. Suppl. 91, 13571392.
Galantucci, B., Fowler, C.A., Turvey, M.T., 2006. The motor theory of speech perception
reviewed. Psychon. Bull. Rev. 13, 361377.
Goldstein, M.H., King, A.P., West, M.J., 2003. Social interaction shapes babbling: testing
parallels between birdsong and speech. Proc. Natl. Acad. Sci. 100, 80308035.
Grossmann, T., 2013. Mapping prefrontal cortex functions in human infancy. Infancy 18,
Grossmann, T., 2015. The development of social brain functions in infancy. Psychol. Bull.
144, 12661287.
Grossmann, T., Johnson, M.H., Lloyd-Fox, S., Blasi, A., Deligianni, F., Elwell, C., Csibra,
G., 2008. Early cortical specialization for face-to-face communication in human in-
fants. Proc. R. Soc. B 275, 28032811.
Grossmann, T., Oberecker, R., Koch, S.P., Friederici, A.D., 2010. The developmental
origins of voice processing in the human brain. Neuron 65, 852858.
Guellai, B., Streri, A., Young, H.H., 2014. The development of sensomotor inuences in
the audiovisual speech domain: some critical questions. Front. Psychol. 5, 17.
Hagoort, P., 2014. Nodes and networks in the neural architecture for language: Brocas
region and beyond. Curr. Opin. Neurobiol. 28, 136141.
Hall, D., Fussell, C., Summereld, A., 2005. Reading uent speech from talking faces:
typical brain networks and individual dierences. J. Cogn. Neurosci. 17, 939953.
Hickok, G., Houde, J., Rong, F., 2011. Sensorimotor integration in speech processing:
computational basis and neural organization. Neuron 69, 407422.
Hickok, G., Poeppel, D., 2007. The cortical organization of speech. Nat. Rev. Neurosci. 8,
Hocking, J., Price, C., 2008. The role of the posterior superior temporal sulcus in
audiovisual processing. Cereb. Cortex 18, 24392449.
Hyde, D., Jones, B., Porter, C., Flom, R., 2010. Visual stimulation enhances auditory
processing in 3-month-old infants and adults. Dev. Psychobiol. 52, 181189.
Hyde, D., Jones, B., Flom, R., Porter, C., 2011. Neural signatures of face-voice synchrony
in 5-month-old human infants. Dev. Psychobiol. 53, 359370.
Imada, T., Zhang, Y., Cheour, M., Taulu, S., Ahonen, A., Kuhl, P.K., 2006. Infant speech
perception activates Brocas area: a developmental magnetoencehalography study.
NeuroReport 17, 957962.
Johnson, M.H., Dziurawiec, S., Ellis, H., Morton, J., 1991. Newborns preferential tracking
of face-like stimuli and its subsequent decline. Cognition 40, 119.
Jusczyk, P., 1998. Constraining the search for structure in the input. Lingua 106,
Kabdebon, C., Leroy, F., Simmonet, H., Perrot, M., Dubois, J., Dehaene-Lambertz, G.,
2014. Anatomical correlations of the international 10-20 sensor placement system in
infants. NeuroImage 99, 342356.
Kubicek, C., Boisferon, A., Dupierrix, E., Pascalis, O., Loevenbruck, H., Gervain, J.,
Schwarzer, G., 2014. Cross-modal matching of audio-visual German and French
uent speech in infancy. PLoS One 9 e89275.
Kuhl, P.K., 2007. Is speech learninggatedby the social brain? Dev. Sci. 10, 110120.
Kuhl, P., Meltzo, A., 1982. The bimodal perception of speech in infancy. Science 218,
Kuhl, P.K., Tsao, F., Liu, H., 2003. Foreign-language experience in infancy: eects of
short-term exposure and social interaction on phonetic learning. Proc. Natl. Acad. Sci.
100, 90969101.
Kuhl, P.K., Ramirez, R.R., Bosseler, A., Lotus Lin, J.-F., Imada, T., 2014. Infantsbrain
responses to speech suggest analysis by synthesis. Proc. Natl. Acad. Sci. 111,
N. Altvater-Mackensen, T. Grossmann 
Kushnerenko, E., Tomalski, P., Bailleux, H., Potton, A., Birtles, D., Frostick, C., Moore,
D.G., 2013. Brain responses and looking behavior during audiovisual speech in-
tegration in infants predict auditory speech comprehension in the second year of life.
Front. Psychol. 4, 432.
Lewkowicz, D.J., 2010. Infant perception of audio-visual speech synchrony. Dev. Psychol.
46, 6677.
Lewkowicz, D., Ghazanfar, A., 2006. The decline of cross-species intersensory perception
in human infants. Proc. Natl. Acad. Sci. 103, 67716774.
Lewkowicz, D., Hansen-Tift, A., 2012. Infants deploy selective attention to the mouth of a
talking face when learning speech. Proc. Natl. Acad. Sci. 109, 14311436.
Liberman, A., 1957. Some results of research on speech perception. J. Acoust. Soc. Am.
29, 117123.
Lloyd-Fox, S., Blasi, A., Elwell, C.E., 2010. Illuminating the developing brain: the past,
present and future of functional near infrared spectroscopy. Neurosci. Biobehav. Rev.
34, 269284.
McGurk, H., MacDonald, J., 1976. Hearing lips and seeing voices. Nature 264, 746748.
Mampe, B., Friederici, A., Christophe, A., Wermke, K., 2009. Newbornscry melody is
shaped by their native language. Curr. Biol. 19, 19941997.
Majorano, M., Vihman, M., DePaolis, R., 2014. The relationship between infantspro-
duction experience and their processing of speech. Lang. Learn. Dev. 10, 179204.
Mani, N., Schneider, S., 2013. Speaker identity supports phonetic category learning. J.
Exp. Psychol. Hum. Percept. Perform. 39, 623629.
Maurer, D., Werker, J.F., 2014. Perceptual narrowing during infancy: a comparison of
language and faces. Dev. Psychobiol. 56, 154178.
McGettigan, C., Faulkner, A., Altarelli, I., Obleser, J., Baverstock, H., Scott, S., 2012.
Speech comprehension aided by multiple modalities: behavioural and neural inter-
actions. Neuropsychologia 50, 762776.
Meek, J., 2002. Basic principles of optical imaging and application to the study of infant
development. Dev. Sci. 5, 371380.
Mehler, J., Jusczyk, P.W., Lambertz, G., Halsted, G., Bertoncini, J., Amiel-Tison, C., 1988.
A precursor of language acquisition in young infants. Cognition 29, 143178.
Miller, E.K., Cohen, J.D., 2001. An integrative theory of prefrontal cortex function. Annu.
Rev. Neurosci. 24, 167202.
Minagawa-Kawai, Y., Cristia, A., Dupoux, E., 2011. Cerebral lateralization and early
speech acquisition: a developmental scenario. Dev. Cogn. Neurosci. 1, 217232.
Moon, C., Cooper, R., Fifer, W., 1993. Two-day-olds prefer their native language. Infant
Behav. Dev. 16, 495500.
Mugitani, R., Kobayashi, T., Hiraki, K., 2008. Audiovisual matching of lips and non-ca-
nonical sounds in 8-month-old infants. Infant Behav. Dev. 31, 307310.
Nath, A., Beauchamp, M., 2012. A neural basis for interindividual dierences in the
McGurk eect, a multisensory speech illusion. NeuroImage 59, 781787.
Naoi, N., Minagawa-Kawai, Y., Kobayashi, A., Takeuchi, K., Nakamura, K., Yamamoto, J.,
Kojima, S., 2012. Cerebral responses to infant-directed speech and the eect of talker
familiarity. NeuroImage 59, 17351744.
Ojanen, V., Möttönen, R., Pekkola, J., Jääskeläinen, I.P., Joensuu, R., Autti, T., Sams, M.,
2005. Processing of audiovisual speech in Brocas area. NeuroImage 25, 333338.
Patterson, M.L., Werker, J.F., 2003. Two-month-old infants match phonetic information
in lips and voices. Dev. Sci. 6, 191196.
Paulescu, E., et al., 2003. A functional-anatomical model for lipreading. Neurophysiology
90, 20052013.
Peelle, J., 2012. The hemispheric lateralization of speech processing depends on what
speechis: a hierarchical perspective. Front. Hum. Neurosci. 6, 309.
Peña, M., Maki, A., Kovacic, D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, F., Mehler,
J., 2003. Proc. Natl. Acad. Sci. 100, 1170211705.
Perani, D., et al., 2011. Neural language networks at birth. Proc. Natl. Acad. Sci. 108,
Poeppel, D., Monahan, P.J., 2011. Feedforward and feedback in speech perception: re-
visiting analysis by synthesis. Lang. Cogn. Process. 26, 935951.
Pons, F., Lewkowicz, D., Soto-Faroco, S., Sebastian-Galles, N., 2009. Narrowing inter-
sensory speech perception in infancy. Proc. Natl. Acad. Sci. 106, 1059810602.
Pulvermüller, F., Fadiga, L., 2010. Active perception: sensorimotor circuits as a cortical
basis for language. Nat. Rev. Neurosci. 11, 351360.
Reynolds, G., Bahrick, L., Lickliter, R., Guy, M., 2014. Neural correlates of intersensory
processing in 5-month-old infants. Dev. Psychobiol. 56, 355372.
Rossi, S., Telkemeyer, S., Wartenburger, I., Obrig, H., 2012. Shedding light on words and
sentences: near-infrared spectroscopy in language research. Brain Lang. 121,
Sams, M., Aulanko, R., Hamalainen, M., Hari, R., Lounasmaa, O.V., Lu, S.T., Simola, J.,
1991. Seeing speech: visual information from lip movements modies activity in the
human auditory cortex. Neurosci. Lett. 127, 141145.
Schwartz, J.-L., Berthommier, F., Savariaux, C., 2004. Seeing to hear better: evidence for
early audio-visual interactions in speech identication. Cognition 93, B69B78.
Scott, S., McGettigan, C., Eisner, F., 2009. A little more conversation, a little less action
candidate roles for the motor cortex in speech perception. Nat. Rev. Neurosci. 10,
Shaw, K., Baart, M., Depowski, N., Bortfeld, H., 2015. Infantspreference for native
audiovisual speech dissociated from congruency preference. PLoS One 10 e0126059.
Shaw, K., Bortfeld, H., 2015. Sources of confusion in infant audiovisual speech perception
research. Front. Psychol. 6, 1814.
Skipper, J.I., van Wassenhove, V., Nusbaum, H.C., Small, S.L., 2007. Hearing lips and
seeing voices: how cortical areas supporting speech production mediate audiovisual
speech perception. Cereb. Cortex 17, 23872399.
Song, J., Lee, H., Kang, H., Lee, D., Chang, S., Oh, S., 2015. Eects of congruent and
incongruent visual cues on speech perception and brain activity in cochlear implant
users. Brain Struct. Funct. 220, 11091125.
Taga, G., Asakawa, K., 2007. Selectivity and localization of cortical response to auditory
and visual stimulation in awake infants aged 2 to 4 months. NeuroImage 36,
Taga, G., Asakaw, K., Hirasawa, K., Konishi, Y., Koizumi, H., 2003. Brain imaging in
awake infants by near-infrared optical topography. Proc. Natl. Acad. Sci. 100,
Teinonen, T., Aslin, R., Alku, P., Csibra, G., 2008. Visual speech contributes to phonetic
learning in 6-month-olds. Cognition 108, 850855.
Tenenbaum, E., Sha, R., Sobel, D., Malle, B., Morgan, J., 2013. Increased focus on the
mouth among infants in the rst year of life: a longitudinal eye-tracking study.
Infancy 18, 534553.
Ter Schure, S., Junge, C., Boersma, P., 2016. Discriminating non-native vowels on the
basis of multimodal, auditory or visual information: eects on infantslooking pat-
terns and discrimination. Front. Psychol. 7, 525.
Venezia, J., Fillmore, P., Matchin, W., Isenberg, A., Hickok, G., Fridriksson, J., 2016.
Perception drives production across sensory modalities: a network for sensorimotor
integration of visual speech. NeuroImage 126, 196207.
Vihman, M.M., 1996. Phonological Development. The Origins of Language in the Child.
Blackwell, Oxford.
Vouloumanos, A., Hauser, M.D., Werker, J.F., Martin, A., 2010. The tuning of human
neonatespreference for speech. Child Dev. 81, 517527.
Watanabe, H., Homae, F., Nakano, T., Tsuzuki, D., Enkthur, L., Nemoto, K., Dan, I., Taga,
G., 2013. Eect of auditory input on activation in infant diverse cortical regions
during auidovisual processing. Hum. Brain Mapp. 34, 543565.
Weikum, W.M., Vouloumanos, A., Navarra, J., Soto-Faraco, S., Sebastián-Gallés, N.,
Werker, J.F., 2007. Visual language discrimination in infancy. Science 316, 1159.
Werker, J., Tees, R., 1984. Developmental changes across childhood in the perception of
nonnative speech sounds. Can. J. Psychol. 37, 278286.
Westermann, G., Miranda, E.R., 2004. A new model of sensorimotor coupling in the de-
velopment of speech. Brain Lang. 89, 393400.
Wilson, S.M., Saygin, A.P., Sereno, M., Iacoboni, M., 2004. Listening to speech activates
motor areas involved in speech production. Nat. Neurosci. 7, 701702.
Yeung, H., Werker, J., 2013. Lip movements aect infantsaudiovisual speech perception.
Psychol. Sci. 24, 603612.
N. Altvater-Mackensen, T. Grossmann 
... We consider that the activations of the bilateral temporal regions in our results came from infants who were immature perceivers of the McGurk effect. Whether and when a greater activation (supra-additive) of the left STS for processing audiovisual speech in infants is similar to that of adults remains to be discussed (e.g., Altvater-Mackensen and Grossmann, 2018). A future study should clarify this developmental process. ...
Full-text available
Previous studies have revealed perceptual narrowing for the own-race-face in face discrimination, but this phenomenon is poorly understood in face and voice integration. We focused on infants’ brain responses to the McGurk effect to examine whether the other-race effect occurs in the activation patterns. In Experiment 1, we conducted fNIRS measurements to find the presence of a mapping of the McGurk effect in Japanese 8- to 9-month-old infants and to examine the difference between the activation patterns in response to own-race-face and other-race-face stimuli. We used two race-face conditions, own-race-face (East Asian) and other-race-face (Caucasian), each of which contained audiovisual-matched and McGurk-type stimuli. While the infants (N = 34) were observing each speech stimulus for each race, we measured cerebral hemoglobin concentrations in bilateral temporal brain regions. The results showed that in the own-race-face condition, audiovisual-matched stimuli induced the activation of the left temporal region, and the McGurk stimuli induced the activation of the bilateral temporal regions. No significant activations were found in the other-race-face condition. These results mean that the McGurk effect occurred only in the own-race-face condition. In Experiment 2, we used a familiarization/novelty preference procedure to confirm that the infants (N = 28) could perceive the McGurk effect in the own-race-face condition but not that of the other-race-face. The behavioral data supported the results of the fNIRS data, implying the presence of narrowing for the own-race face in the McGurk effect. These results suggest that narrowing of the McGurk effect may be involved in the development of relatively high-order processing, such as face-to-face communication with people surrounding the infant. We discuss the hypothesis that perceptual narrowing is a modality-general, pan-sensory process.
... For the GLM, data were correlated with a predictor generated by convolving the boxcar function of the stimulus design including 4 different conditions (legal first experiment half, legal second experiment half, illegal first experiment half, illegal second experiment half) with the canonical hrf (Boynton et al., 2012;Huettel and McCarthy, 2009). During this modelling, a stimulation period of 2 s (i.e., on-condition) and a resting period (i.e., off-condition; silence) resulting from ISIs was assumed and a high-pass filter of 20 s to remove drifts and slow fluctuations was applied (for similar procedures please see Altvater-Mackensen and Grossmann, 2018Grossmann, , 2016Koch et al., 2006;Obrig et al., 2017;Telkemeyer et al., 2009). GLM using a canonical hrf (peak at 5 s and further 15 s for return to baseline) provides Beta-values which were used for statistical analyses. ...
Full-text available
The present study investigated neural correlates of implicit phonotactic processing in 18-month-old children that just reached an important step in language development: the vocabulary spurt. Pseudowords, either phonotactically legal or illegal with respect to their native language, were acoustically presented to monolingually German raised infants. Neural activity was simultaneously assessed by means of electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS). The former method excellently tracks fast processing mechanisms, whereas the latter reveals brain areas recruited. Results of the present study indicate that 18-month-olds recognize the linguistic properties of their native language based on phonotactics. This manifested in an increased N400 for legal compared to illegal pseudowords in the EEG conforming to adult-like mechanisms. Unfortunately, fNIRS findings did not support this discrimination ability. Possible methodological and brain maturational reasons might explain this null finding. This study provides evidence for the advantage of a multi-methodological approach in order to get a clear picture on neural language development.
... For the GLM, data were correlated with a predictor generated by convolving the boxcar function of the stimulus design including 6 different conditions (angry first experiment half, angry second experiment half, happy first experiment half, happy second experiment half, neutral first experiment half, neutral second experiment half) with the canonical hrf 91 . During this modelling, a stimulation period of 2 s (i.e., on-condition) and a resting period (i.e., off-condition; silence) resulting from ISIs was assumed and a high-pass filter of 20 s to remove drifts and slow fluctuations was applied (for similar procedures please see 67,92 ). ...
Full-text available
The capability of differentiating between various emotional states in speech displays a crucial prerequisite for successful social interactions. The aim of the present study was to investigate neural processes underlying this differentiating ability by applying a simultaneous neuroscientific approach in order to gain both electrophysiological (via electroencephalography, EEG) and vascular (via functional near-infrared-spectroscopy, fNIRS) responses. Pseudowords conforming to angry, happy, and neutral prosody were presented acoustically to participants using a passive listening paradigm in order to capture implicit mechanisms of emotional prosody processing. Event-related brain potentials (ERPs) revealed a larger P200 and an increased late positive potential (LPP) for happy prosody as well as larger negativities for angry and neutral prosody compared to happy prosody around 500 ms. FNIRS results showed increased activations for angry prosody at right fronto-temporal areas. Correlation between negativity in the EEG and activation in fNIRS for angry prosody suggests analogous underlying processes resembling a negativity bias. Overall, results indicate that mechanisms of emotional and phonological encoding (P200), emotional evaluation (increased negativities) as well as emotional arousal and relevance (LPP) are present during implicit processing of emotional prosody.
... As the number of fNIRS publications on neurodevelopment continues to grow, so does the range of topics it is being used to investigate. For example, the past year has seen the publication of fNIRS infant studies focusing on speech processing [4,5], social perception and interaction [6], and face perception [7,8], alongside studies investigating more complex processing networks such as mimicry and self-perception [9,10], touch [11][12][13][14] and live interaction [15,16]. ...
Full-text available
Over the past 25 years, functional near-infrared spectroscopy (fNIRS) has emerged as a valuable tool to study brain function, and it is in younger participants where it has found, arguably, its most successful application. Thanks to its infant-friendly features, the technology has helped shape research in the neurocognitive development field by contributing to our understanding of the neural underpinnings of sensory perception and socio-cognitive skills. Furthermore, it has provided avenues of exploration for markers of compromised brain development. Advances in fNIRS instrumentation and methods have enabled the next step in the evolution of its applications including the investigation of the effects of complex and interacting socio-economic and environmental adversities on brain development. To do this, it is necessary to take fNIRS out of well-resourced research labs (the majority located in high-income countries) to study at-risk populations in resource-poor settings in low- and middle-income countries (LMICs). Here we review the use of this technology in global health studies, we discuss the implementation of fNIRS studies in LMICs with a particular emphasis on the Brain Imaging for Global Health (BRIGHT) project, and we consider its potential in this emerging field.
Significance: There is a longstanding recommendation within the field of fNIRS to use oxygenated ( HbO 2 ) and deoxygenated (HHb) hemoglobin when analyzing and interpreting results. Despite this, many fNIRS studies do focus on HbO 2 only. Previous work has shown that HbO 2 on its own is susceptible to systemic interference and results may mostly reflect that rather than functional activation. Studies using both HbO 2 and HHb to draw their conclusions do so with varying methods and can lead to discrepancies between studies. The combination of HbO 2 and HHb has been recommended as a method to utilize both signals in analysis. Aim: We present the development of the hemodynamic phase correlation (HPC) signal to combine HbO 2 and HHb as recommended to utilize both signals in the analysis. We use synthetic and experimental data to evaluate how the HPC and current signals used for fNIRS analysis compare. Approach: About 18 synthetic datasets were formed using resting-state fNIRS data acquired from 16 channels over the frontal lobe. To simulate fNIRS data for a block-design task, we superimposed a synthetic task-related hemodynamic response to the resting state data. This data was used to develop an HPC-general linear model (GLM) framework. Experiments were conducted to investigate the performance of each signal at different SNR and to investigate the effect of false positives on the data. Performance was based on each signal's mean T -value across channels. Experimental data recorded from 128 participants across 134 channels during a finger-tapping task were used to investigate the performance of multiple signals [ HbO 2 , HHb, HbT, HbD, correlation-based signal improvement (CBSI), and HPC] on real data. Signal performance was evaluated on its ability to localize activation to a specific region of interest. Results: Results from varying the SNR show that the HPC signal has the highest performance for high SNRs. The CBSI performed the best for medium-low SNR. The next analysis evaluated how false positives affect the signals. The analyses evaluating the effect of false positives showed that the HPC and CBSI signals reflect the effect of false positives on HbO 2 and HHb. The analysis of real experimental data revealed that the HPC and HHb signals provide localization to the primary motor cortex with the highest accuracy. Conclusions: We developed a new hemodynamic signal (HPC) with the potential to overcome the current limitations of using HbO 2 and HHb separately. Our results suggest that the HPC signal provides comparable accuracy to HHb to localize functional activation while at the same time being more robust against false positives.
An auditory-visual speech benefit, the benefit that visual speech cues bring to auditory speech perception, is experienced from early on in infancy and continues to be experienced to an increasing degree with age. While there is both behavioural and neurophysiological evidence for children and adults, only behavioural evidence exists for infants – as no neurophysiological study has provided a comprehensive examination of the auditory-visual speech benefit in infants. It is also surprising that most studies on auditory-visual speech benefit do not concurrently report looking behaviour especially since the auditory-visual speech benefit rests on the assumption that listeners attend to a speaker's talking face and that there are meaningful individual differences in looking behaviour. To address these gaps, we simultaneously recorded electroencephalographic (EEG) and eye-tracking data of 5-month-olds, 4-year-olds and adults as they were presented with a speaker in auditory-only (AO), visual-only (VO), and auditory-visual (AV) modes. Cortical tracking analyses that involved forward encoding models of the speech envelope revealed that there was an auditory-visual speech benefit [i.e., AV > (A+V)], evident in 5-month-olds and adults but not 4-year-olds. Examination of cortical tracking accuracy in relation to looking behaviour, showed that infants’ relative attention to the speaker's mouth (vs. eyes) was positively correlated with cortical tracking accuracy of VO speech, whereas adults’ attention to the display overall was negatively correlated with cortical tracking accuracy of VO speech. This study provides the first neurophysiological evidence of auditory-visual speech benefit in infants and our results suggest ways in which current models of speech processing can be fine-tuned.
Full-text available
Faces are a rich source of social information. How does the infant brain develop the ability to recognize faces and identify potential social partners? We collected functional magnetic neuroimaging (fMRI) data from 49 awake human infants (aged 2.5-9.7 months) while they watched movies of faces, bodies, objects, and scenes. Face-selective responses were observed not only in ventral temporal cortex (VTC) but also in superior temporal sulcus (STS), and medial prefrontal cortex (MPFC). Face responses were also observed (but not fully selective) in the amygdala and thalamus. We find no evidence that face-selective responses develop in visual perception regions (VTC) prior to higher order social perception (STS) or social evaluation (MPFC) regions. We suggest that face-selective responses may develop in parallel across multiple cortical regions. Infants' brains could thus simultaneously process faces both as a privileged category of visual images, and as potential social partners.
Full-text available
This study investigated the difference in the McGurk effect between own-race-face and other-race-face stimuli among Japanese infants from 5 to 9 months of age. The McGurk effect results from infants using information from a speaker’s face in audiovisual speech integration. We hypothesized that the McGurk effect varies with the speaker’s race because of the other-race effect, which indicates an advantage for own-race faces in our face processing system. Experiment 1 demonstrated the other-race effect on audiovisual speech integration such that the infants ages 5–6 months and 8–9 months are likely to perceive the McGurk effect when observing an own-race-face speaker, but not when observing an other-race-face speaker. Experiment 2 found the other-race effect on audiovisual speech integration regardless of irrelevant speech identity cues. Experiment 3 confirmed the infants’ ability to differentiate two auditory syllables. These results showed that infants are likely to integrate voice with an own-race-face, but not with an other-race-face. This implies the role of experiences with own-race-faces in the development of audiovisual speech integration. Our findings also contribute to the discussion of whether perceptual narrowing is a modality-general, pan-sensory process.
Full-text available
Infants' perception of speech sound contrasts is modulated by their language environment, for example by the statistical distributions of the speech sounds they hear. Infants learn to discriminate speech sounds better when their input contains a two-peaked frequency distribution of those speech sounds than when their input contains a one-peaked frequency distribution. Effects of frequency distributions on phonetic learning have been tested almost exclusively for auditory input. But auditory speech is usually accompanied by visual information, that is, by visible articulations. This study tested whether infants' phonological perception is shaped by distributions of visual speech as well as by distributions of auditory speech, by comparing learning from multimodal (i.e., auditory-visual), visual-only, or auditory-only information. Dutch 8-month-old infants were exposed to either a one-peaked or two-peaked distribution from a continuum of vowels that formed a contrast in English, but not in Dutch. We used eye tracking to measure effects of distribution and sensory modality on infants' discrimination of the contrast. Although there were no overall effects of distribution or modality, separate t-tests in each of the six training conditions demonstrated significant discrimination of the vowel contrast in the two-peaked multimodal condition. For the modalities where the mouth was visible (visual-only and multimodal) we further examined infant looking patterns for the dynamic speaker's face. Infants in the two-peaked multimodal condition looked longer at her mouth than infants in any of the three other conditions. We propose that by 8 months, infants' native vowel categories are established insofar that learning a novel contrast is supported by attention to additional information, such as visual articulations.
Full-text available
Speech is a multimodal stimulus, with information provided in both the auditory and visual modalities. The resulting audiovisual signal provides relatively stable, tightly correlated cues that support speech perception and processing in a range of contexts. Despite the clear relationship between spoken language and the moving mouth that produces it, there remains considerable disagreement over how sensitive early language learners—infants—are to whether and how sight and sound co-occur. Here we examine sources of this disagreement, with a focus on how comparisons of data obtained using different paradigms and different stimuli may serve to exacerbate misunderstanding.
Full-text available
Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds on their ability to detect mismatches between concurrently presented auditory and visual vowels and related their performance to their productive abilities and later vocabulary size. Results show that infants' ability to detect mismatches between auditory and visually presented vowels differs depending on the vowels involved. Furthermore, infants' sensitivity to mismatches is modulated by their current articulatory knowledge and correlates with their vocabulary size at 12 months of age. This suggests that-aside from infants' ability to match nonnative audiovisual cues (Pons et al., 2009)-their ability to match native auditory and visual cues continues to develop during the first year of life. Our findings point to a potential role of salient vowel cues and productive abilities in the development of audiovisual speech perception, and further indicate a relation between infants' early sensitivity to audiovisual speech cues and their later language development. (PsycINFO Database Record
Full-text available
Although infant speech perception in often studied in isolated modalities, infants' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces). Across two experiments, we tested infants’ sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English) and non-native (Spanish) language. In Experiment 1, infants’ looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native) auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.
The period between six and 12 months is a sensitive period for language learning during which infants undergo auditory perceptual attunement, and recent results indicate that this sensitive period may exist across sensory modalities. We tested infants at three stages of perceptual attunement (six, nine, and 11 months) to determine (1) whether they were sensitive to the congruence between heard and seen speech stimuli in an unfamiliar language, and (2) whether familiarization with congruent audiovisual speech could boost subsequent non-native auditory discrimination. Infants at six- and nine-, but not 11- months, detected audiovisual congruence of non-native syllables. Familiarization to incongruent, but not congruent, audiovisual speech changed auditory discrimination at test for six-month-olds but not nine- or 11-month-olds. These results advance the proposal that speech perception is audiovisual from early in ontogeny, and that the sensitive period for audiovisual speech perception may last somewhat longer than that for auditory perception alone.
In the first year of life, infants' speech perception attunes to their native language. While the behavioral changes associated with native language attunement are fairly well mapped, the underlying mechanisms and neural processes are still only poorly understood. Using fNIRS and eye-tracking, the current study investigated 6-month-old infants' processing of audiovisual speech that contained matching or mismatching auditory and visual speech cues. Our results revealed that infants' speech-sensitive brain responses in inferior frontal brain regions were lateralized to the left hemisphere. Critically, our results further revealed that speech-sensitive left inferior frontal regions showed enhanced responses to matching when compared to mismatching audiovisual speech, and that infants with a preference to look at the speaker's mouth showed an enhanced left inferior frontal response to speech compared to infants with a preference to look at the speaker's eyes. These results suggest that left inferior frontal regions play a crucial role in associating information from different modalities during native language attunement, fostering the formation of multimodal phonological categories.
The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception–production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants’ speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants’ tongues. With a looking-time procedure, we found that temporarily restraining infants’ articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral–motor movements influence speech sound discrimination. Moreover, an experimentally induced “impairment” in articulator movement can compromise speech perception performance, raising the question of whether long-term oral–motor impairments may impact perceptual development.
Watching a speaker’s lips during face-to-face conversation (lipreading) markedly improves speech perception, particularly in noisy conditions. With functional magnetic resonance imaging it was found that these linguistic visual cues are sufficient to activate auditory cortex in normal hearing individuals in the absence of auditory speech sounds. Two further experiments suggest that these auditory cortical areas are not engaged when an individual is viewing nonlinguistic facial movements but appear to be activated by silent meaningless speechlike movements (pseudospeech). This supports psycholinguistic evidence that seen speech influences the perception of heard speech at a prelexical stage.
One fundamental question in psychology is what makes humans such intensely social beings. Probing the developmental and neural origins of our social capacities is a way of addressing this question. In the last 10 years the field of social-cognitive development has witnessed a surge in studies using neuroscience methods to elucidate the development of social information processing during infancy. While the use of electroencephalography (EEG)/event-related brain potentials (ERPs) and functional near-infrared spectroscopy (fNIRS) has revealed a great deal about the timing and localization of the cortical processes involved in early social cognition, the principles underpinning the early development of social brain functioning remain largely unexplored. Here I provide a framework that delineates the essential processes implicated in the early development of the social brain. In particular, I argue that the development of social brain functions in infancy is characterized by the following key principles: (a) self-relevance, (b) joint engagement, (c) predictability, (d) categorization, (e) discrimination, and (f) integration. For all of the proposed principles, I provide empirical examples to illustrate when in infancy they emerge. Moreover, I discuss to what extent they are in fact specifically social in nature or share properties with more domain-general developmental principles. Taken together, this article provides a conceptual integration of the existing EEG/ERPs and fNIRS work on infant social brain function and thereby offers the basis for a principle-based approach to studying the neural correlates of early social cognition. (PsycINFO Database Record (c) 2015 APA, all rights reserved).