ArticlePDF Available

Abstract

Infants prefer to be addressed with infant-directed speech (IDS). IDS benefits language acquisition through amplified low-frequency amplitude modulations. It has been reported that this amplification increases electrophysiological tracking of IDS compared to adult-directed speech (ADS). It is still unknown which particular frequency band triggers this effect. Here, we compare tracking at the rates of syllables and prosodic stress, which are both critical to word segmentation and recognition. In mother-infant dyads (n=30), mothers described novel objects to their 9-month-olds while infants’ EEG was recorded. For IDS, mothers were instructed to speak to their children as they typically do, while for ADS, mothers described the objects as if speaking with an adult. Phonetic analyses confirmed that pitch features were more prototypically infant-directed in the IDS-condition compared to the ADS-condition. Neural tracking of speech was assessed by speech-brain coherence, which measures the synchronization between speech envelope and EEG. Results revealed significant speech-brain coherence at both syllabic and prosodic stress rates, indicating that infants track speech in IDS and ADS at both rates. We found significantly higher speech-brain coherence for IDS compared to ADS in the prosodic stress rate but not the syllabic rate. This indicates that the IDS benefit arises primarily from enhanced prosodic stress. Thus, neural tracking is sensitive to parents’ speech adaptations during natural interactions, possibly facilitating higher-level inferential processes such as word segmentation from continuous speech.
NeuroImage 251 (2022) 118991
Contents lists available at ScienceDirect
NeuroImage
journal homepage: www.elsevier.com/locate/neuroimage
Natural infant-directed speech facilitates neural tracking of prosody
Katharina H. Menn
a , b , c , 1 ,
, Christine Michel
d , e , 1
, Lars Meyer
b , f
, Stefanie Hoehl
g , 1
,
Claudia Männel
a , h , 1
a
Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstr. 1a, Leipzig 04103, Germany
b
Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstr. 1a, Leipzig 04103, Germany
c
International Max Planck Research School on Neuroscience of Communication: Function, Structure, and Plasticity, Stephanstr 1a, Leipzig 04103, Germany
d
Research Group Early Social Cognition, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstr. 1a, Leipzig 04103, Germany
e
Faculty for Education, Leipzig University, Marschnerstraße 31, Leipzig 04109, Germany
f
Clinic for Phoniatrics and Pedaudiology, University Hospital Münster, Albert-Schweitzer-Campus 1, Münster 48149, Germany
g
University of Vienna, Faculty of Psychology, Universitätsring 1, Vienna 1010, Austria
h
Department of Audiology and Phoniatrics, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin 13353, Germany
Keywords:
EEG
Speech-brain coherence
Speech entrainment
Infant-directed speech
Natural interaction
Adult-directed speech
Infants prefer to be addressed with infant-directed speech (IDS). IDS benets language acquisition through am-
plied low-frequency amplitude modulations. It has been reported that this amplication increases electrophysi-
ological tracking of IDS compared to adult-directed speech (ADS). It is still unknown which particular frequency
band triggers this eect. Here, we compare tracking at the rates of syllables and prosodic stress, which are both
critical to word segmentation and recognition. In mother-infant dyads (n = 30), mothers described novel objects
to their 9-month-olds while infants’ EEG was recorded. For IDS, mothers were instructed to speak to their chil-
dren as they typically do, while for ADS, mothers described the objects as if speaking with an adult. Phonetic
analyses conrmed that pitch features were more prototypically infant-directed in the IDS-condition compared
to the ADS-condition. Neural tracking of speech was assessed by speech-brain coherence, which measures the
synchronization between speech envelope and EEG. Results revealed signicant speech-brain coherence at both
syllabic and prosodic stress rates, indicating that infants track speech in IDS and ADS at both rates. We found sig-
nicantly higher speech-brain coherence for IDS compared to ADS in the prosodic stress rate but not the syllabic
rate. This indicates that the IDS benet arises primarily from enhanced prosodic stress. Thus, neural tracking is
sensitive to parents’ speech adaptations during natural interactions, possibly facilitating higher-level inferential
processes such as word segmentation from continuous speech.
1. Introduction
Across many languages, adults address infants in a characteristic reg-
ister termed infant-directed speech (IDS) ( Cristia, 2013; Fernald et al.,
1989; Soderstrom, 2007 ). IDS diers from adult-directed speech (ADS)
along acoustic and linguistic dimensions. In particular, IDS contains
exaggerated prosodic cues ( Fernald and Simon, 1984; Fernald et al.,
1989; Grieser and Kuhl, 1988; Katz et al., 1996 ), is syntactically simpler
( Soderstrom et al., 2008 ) and may be spoken more slowly ( Raneri et al.,
2020 ) with expanded vowel sounds ( Adriaans and Swingley, 2017;
Green et al., 2010 ). Previous electrophysiological (EEG) work has in-
dicated that these IDS characteristics benet infants’ speech process-
ing (e.g. Háden et al., 2020; Zangl and Mills, 2007 ). While earlier
EEG studies mostly focused on event-related potentials, we here em-
Corresponding author.
E-mail address: menn@cbs.mpg.de (K.H. Menn).
1 These authors each contributed equally and should be regarded as shared-rst and shared-senior authors, respectively.
ploy EEG to examine infants’ online speech processing continuously.
There are indications that IDS benets infants’ language acquisition
in particular. Frequent exposure to IDS boosts later vocabulary devel-
opment ( Ramírez-Esparza et al., 2014; Weisleder and Fernald, 2013 )
and laboratory studies showed that IDS assists infants’ word segmenta-
tion ( Schreiner and Mani, 2017; Thiessen et al., 2005 ) and recognition
( Männel and Friederici, 2013; Singh et al., 2009 ), and their acquisition
of word-object associations ( Graf Estes and Hurley, 2013 ) over ADS.
Which specic acoustic cues in IDS help infants’ language acqui-
sition? Candidates include increased fundamental frequency (F0) and
F0 modulation (see Spinelli et al., 2017 for a meta-analysis). In re-
cent years, a particular focus has been put on the amplitude modula-
tion structure in IDS. Continuous speech contains acoustic information
at dierent timescales, which to a certain extend correspond to lin-
guistic units, such as phonemes, syllables, and intonation phrases. In
particular, the amplitude envelope conveys the boundaries of linguis-
tic units even to infant listeners who lack vocabulary as such (see also
https://doi.org/10.1016/j.neuroimage.2022.118991 .
Received 23 September 2021; Received in revised form 2 February 2022; Accepted 10 February 2022
Available online 12 February 2022.
1053-8119/© 2022 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Goswami, 2019 ). Leong and Goswami (2015) analyzed the amplitude
modulation structure of nursery rhymes, a particularly rhythmic form
of IDS, which were read by female speakers prompted with a picture
depicting young children. The authors found that amplitude modula-
tions are centered around three frequency rates, which match the occur-
rence rates of: prosodic stress (~2 Hz), syllables (~5 Hz), and phonemes
(~20 Hz). When comparing spontaneously produced IDS during mother-
infant interactions to ADS that the mother produced when interacting
with another adult, Leong et al. (2017) found that amplitude modula-
tions of prosodic stress are enhanced for IDS compared to ADS. This
exaggeration of prosodic stress in IDS may be benecial for infants’ lan-
guage development, as stress can provide an important cue for word
onsets in naturalistic speech ( Cutler and Carter, 1987; Jusczyk et al.,
1999; Stärk et al., 2021 ) and thus aid word segmentation. If infants are
sensitive to the pronounced stress modulations in IDS, these could thus
provide an important stepping stone into language acquisition.
Recent studies have shown that infants’ neural activity tracks
speech by synchronizing with amplitude modulations corresponding to
prosodic stress and syllables in nursery rhymes ( Attaheri et al., 2022 ).
For adults, it has been shown that the synchronization between neural
activity and speech acoustics supports the segmentation and identica-
tion of linguistic units in speech (see Meyer, 2018 ) and relates to bet-
ter language comprehension ( Doelling et al., 2014; Peelle et al., 2013 ).
Importantly, infants were shown to start tracking simple repeated sen-
tences from birth ( Ortiz Barajas et al., 2021 ). This early emergence sug-
gests that neural tracking may support language development by align-
ing neural activity with speech-relevant amplitude modulations. At least
by 7-months of age, infants’ tracking is sensitive to the kind of speech
register (IDS vs. ADS) and IDS benets tracking of speech over ADS
( Kalashnikova et al., 2018 ). It remains unclear, however, whether this
benet results specically from prosodic stress or other speech charac-
teristics, such as the syllable rhythm.
We here assess infants’ tracking of speech in a naturalistic mother-
infant interaction. The use of naturalistic IDS has the benet of high
ecological validity, as it elucidates infants’ neural processing of the
speech input they typically receive and thus increases generalizabil-
ity of ndings. Naturalistic stimuli allow for the dissociation of mul-
tiple levels of information in parallel (see also Jessen et al., 2021 ).
For this reason, the number of studies relying on naturalistic in-
put for investigating infants’ neural processing of speech has recently
started to increase and stimuli included recordings taken from nat-
ural mother-infant interactions ( Kalashnikova et al., 2018 ), TV car-
toons ( Jessen et al., 2019 ) and one study even directly assessed face-
to-face interactions ( Lloyd-Fox et al., 2015 ). In face-to-face interac-
tions, the speaker’s visual cues are contingent with infant responses,
which is dicult to manipulate in classical experiments. For the cur-
rent study, the most relevant of these contingent cues is eye con-
tact between parents and infants (mutual gaze), which was shown
to increase neural processing of speech if combined with IDS ( Lloyd-
Fox et al., 2015 ). However, given the diculty of manipulating mutual
gaze experimentally, the specic eects on infants’ speech processing
are currently not well understood (for a review, see Çetinçelik et al.,
2020 ).
In the current study we focus on the association between parental
acoustic speech adaptations and infants’ tracking, aiming at delineating
whether neural tracking is facilitated by prosodic stress (dened by pitch
contours) or syllable information (dened by the mean syllable dura-
tion) in IDS. To this end, we here contrast 9-month-old infants’ responses
to their mothers’ IDS versus ADS at the stress rate and the syllabic rate.
Focusing on 9-month-olds is particularly interesting, as infants at this
age have started segmenting words from continuous speech but still
mostly rely on prosodic cues ( Männel and Friederici, 2013; Schreiner
and Mani, 2017 ), meaning that information in the prosodic stress rate
is particularly relevant for their word segmentation ( Kooijman et al.,
2009 ). In mother-infant dyads, mothers described novel objects to their
9-month-olds while the infants’ EEGwas recorded. For IDS, parents were
instructed to speak to their infants as they typically do, while for ADS,
parents were supposed to describe the objects pretending they talk to
an adult without looking at the infant or calling their name. Infants’
tracking of maternal speech during the interactions was assessed using
speech-brain coherence, which measures the synchronization between
the neural signal and the speech envelope. We hypothesized that infants
show speech-brain coherence at both the stress rate and the syllable rate.
Concerning the dierence between IDS and ADS processing, we postu-
late that IDS facilitates tracking ( Kalashnikova et al., 2018 ) and that this
facilitation is driven by enhanced amplitude modulations of prosodic
stress ( Leong et al., 2017 ).
2. Method
The present study reanalyzed data from a previous experiment,
which assessed the inuence of ostensive cues on infants’ visual object
encoding ( Michel et al., 2021 ). Parents were asked to show and describe
a total of 12 novel objects to their infant during a familiarization phase.
Half of the objects were described naturally (IDS-condition), the other
half were described without ostensive cues (i.e., mutual gaze, calling the
infant by their name, and infant-directed speech; ADS-condition). Im-
portantly, parents were asked to refrain from naming the objects. Given
the aim of the present study to examine infants’ neural processing of
natural parental speech, we here assessed infants’ tracking of maternal
speech during the mother-infant interactions. Only the object descrip-
tion phase was analyzed for the purpose of the current study and will
be described in this manuscript.
2.1. Participants
The nal participant sample consisted of 30 German-learning infants
(22 female) and their mothers. On average, infants were 9 months 12
days old (range: 9 months 0 days - 9 months, 29 days). Infants were
born full-term ( > 37 weeks), healthy, and raised in monolingual Ger-
man environments. Our sample size was determined by the previously
collected dataset. Michel et al. (2021) based their sample size on stud-
ies investigating infants’ object encoding using similar paradigms and
measures (e.g. Begus et al., 2015; Hoehl et al., 2014 ).
Additional 51 mother-infant (16 female, 𝑀
𝑎𝑔𝑒
= 9 months 15 days)
interactions were tested, but not included in the current analysis due
to less than 30 s total maternal speech in one of the conditions ( 𝑛 =
17), more than 4 noisy electrodes ( 𝑛 = 1), failure to reach the mini-
mum criterion of 20 EEG epochs per condition after artifact rejection
( 𝑛 = 19), premature birth ( 𝑛 = 1), technical error ( 𝑛 = 6), or infant
fussiness ( 𝑛 = 7). Because of the dierent foci of this manuscript and
the original study ( Michel et al., 2021 ), the exclusion criteria diered
between the manuscripts and only 19 infants were commonly included
in both. Informed written consent was obtained from the mothers be-
fore the experiment and ethical approval for the experimental procedure
and reanalysis of the data was obtained from the Medical Faculty of the
University of Leipzig. All work was conducted in accordance with the
Declaration of Helsinki. The conditions of our ethics approval do not
permit public archiving of participant data. Readers seeking access to
the data should contact the corresponding author to arrange a formal
data sharing agreement.
2.2. Procedure
Mothers and infants were seated across a small table. Infants sat in
a baby chair while their electrophysiological activity was continuously
recorded using EEG. Mother-infant interactions were recorded on video
using four cameras and maternal speech was recorded using a micro-
phone that was placed on the table in front of the mother (see Fig. 1 A).
The study consisted of 4 blocks, during each of which the mother
held three novel objects above the table and spoke about them to her
2
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Fig. 1. Overview of the experiment and analysis. (A) Example of the setting during the mother-infant interactions. Mother and infant sat across each other at a table.
The mother held a novel object and described it to her infant either using IDS or using ADS, while the infant’s EEG was recorded. (B) Overview of the speech-brain
coherence analysis. Cleaned EEG and speech envelope were band-pass ltered in two frequency bands: prosodic stress rate and syllable rate. Coherence between EEG
and envelope was computed for each electrode in both frequency bands.
infant. The blocks alternated between the IDS-condition and the ADS-
condition. The only dierence between the two conditions was the way
in which the mother was asked to describe the objects. Mothers were
told that the aim of the study was to investigate the dierence between
joint observation and individual processing of objects on infants’ vi-
sual object encoding, as this was the goal of the original study. They
were specically told to focus on eye gaze and speech. In the IDS-
condition, the mother was asked to speak to her infant as she nor-
mally would when interacting with a novel object. She was speci-
cally told that she could use IDS, call the infant’s name and look at
the infant. In the ADS-condition, the mother was instructed to describe
the object as if she were speaking to an adult, that is she was asked
to imagine that she was talking to herself or describing the objects to
a close friend. She was also asked to refrain from calling the infant’s
name and looking at the infant, and specically from establishing eye
gaze during the ADS-condition. In both conditions, the infant was not
allowed to touch the objects. The condition of the rst block was coun-
terbalanced between dyads. Mothers were given standardized oral and
written instructions and were reminded of the procedure before every
block.
Each block started with a 20 s baseline, during which infant and
mother looked at soap bubbles produced by an experimenter. After-
wards, the object description phase started either after mutual gaze be-
tween infant and parent had been established (IDS-condition) or after
the child looked at the mother (ADS-condition). In both conditions, the
trial ended after the infant looked at the object for a cumulative total of
20 s. Looking duration was coded online by an experimenter observing
the interactions on a screen. A second experimenter then announced the
end of a trial by thanking the mother and switched the object. Average
trial duration was 39.2 s ( SD = 8.6; see Supplementary Fig. 1 for an
overview of the whole procedure). Mothers were unaware of the look-
ing time criterion. None of the objects had eyes or face-like features on
it. Pretests with an independent sample of infants conrmed that, in
general, infants were unfamiliar with the objects and all objects were
similarly interesting to infants.
2.3. Speech processing
2.3.1. Preprocessing
Audio recordings were annotated and analyzed using Praat
( Boersma, 2001 ). We annotated every instance of maternal speech dur-
ing the object description phase, excluding fragments with any non-
speech interference. Instances of such interference included: infant vo-
calizations, laughter, external noise, or (rhythmic) non-speech sounds,
such as knocking the object on the table, scratching the surface of the
object or tapping against the object. Speech segments with pauses longer
than 1000 ms were coded as separate segments.
2.3.2. Amplitude envelope
The broad-band amplitude envelope of the audio signals was
computed following Gross et al. (2013) using the Chimera toolbox
( Smith et al., 2002 ). The intensity of the speech signal was normalized
per condition. We divided the frequency spectrum from 100 - 8000 Hz
into nine frequency bands equally spaced on the cochlea. The audio
signal was band-pass ltered into these frequency bands with a fourth
order Butterworth lter (forward and backward). Afterwards, the abso-
lute values of the Hilbert transform were computed for each band and
averaged across bands. Last, the envelope was downsampled to 500 Hz,
which corresponds to the sampling rate of the EEG signal.
In addition, we computed the pitch envelope for both conditions sep-
arately. For this we determined the respective F0 range for both speech
conditions (IDS: 145 - 392 Hz; ADS: 138 - 325 Hz), which we divided
into three frequency bands equally spaced on the cochlea. We then fol-
lowed the same procedure as described for the broad-band envelope.
2.3.3. Frequency bands
To identify the syllable rate of mothers’ IDS and ADS, we annotated
the duration of all syllables for the dyads included in the nal analy-
sis. The average syllable duration was 194 ms for the ADS-condition
and 181 ms for the IDS-condition. The syllable rate was determined as
the 2 Hz window centered around the average syllable duration (ADS:
194 ms or 5.15 Hz; IDS: 181 ms or 5.5 Hz), leading to 4.15 Hz - 6.15 Hz
for ADS and 4.5 - 6.5 Hz for IDS.
The prosodic stress rate of mothers’ speech was identied based
on the pitch envelope. For this, we segmented the parts of the pitch
envelope corresponding to uninterrupted maternal speech into epochs
of 2 s length with 50% overlap. We then computed the Fourier trans-
form of each epoch using Slepian multitapers and averaged the resulting
power spectral density (PSD) estimate across epochs and dyads for both
speech conditions. The averaged PSD was visually inspected for devi-
ations from the aperiodic 1/f noise. This way the frequency band for
the prosodic stress rate was determined as 1 - 2.5 Hz. We decided not
to assess amplitudes below 1 Hz since this is the high-pass frequency
recommended for the preprocessing of developmental EEG data (see
e.g. Gabard-Durnam et al., 2018 ). The bands identied for the prosodic
stress rate and the syllable rate were in line with rates reported in pre-
vious studies (e.g. Chandrasekaran et al., 2009; Leong and Goswami,
2015 ).
2.3.4. Amplitude modulations
To compute the amplitude modulations at the syllable rate, we l-
tered the broad-band amplitude envelope in the corresponding fre-
3
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
quency bands for IDS and ADS. We then segmented the parts of the
envelope corresponding to uninterrupted maternal speech into epochs
of 2 s length with 50% overlap. Root mean square values were computed
for every epoch and averaged across epochs for both speech conditions.
Amplitude modulations in the prosodic stress rate were computed
based on the pitch envelope. We band-pass ltered the pitch envelope in
the frequency band corresponding to prosodic stress before proceeding
in the same way as described for the syllable rate.
2.4. Experimental manipulation check
To assess whether the speech in the IDS-condition was more typ-
ically infant-directed than speech in the ADS-condition, we measured
the mean F0 and F0 range (between the 5th and the 95th percentile) of
maternal speech in both conditions as an acoustic correlate of IDS (see,
Spinelli et al., 2017 ). In addition, we tested whether the amplitude mod-
ulations in the prosodic stress rate and the syllable rate diered between
IDS versus ADS. We ran separate t-tests for each acoustic measure, as-
sessing a dierence between the IDS- and the ADS-condition. Note that
we opted for separate tests in assessing condition dierences in ampli-
tude modulations in the two frequency bands since they were computed
based on dierent envelopes and are therefore not directly comparable.
Resulting p-values were corrected for multiple comparisons using false
discovery rate (FDR-correction).
2.5. EEG-Recording and preprocessing
EEG was recorded with a 32-channel EasyCap system by Brain Prod-
ucts GmbH, with active electrodes arranged according to the 10/10 sys-
tem. The sampling rate of the recordings was 500 Hz. The right mas-
toid served as the online reference and vertical electrooculograms were
recorded bipolarly if tolerated by the infant.
EEG processing was done using the publicly available ’eeglab’
( Delorme and Makeig, 2004 ) and ’eldtrip’ ( Oostenveld et al., 2011 )
toolboxes as well as custom Matlab code (The MathWorks, Inc., Natick,
US). EEG preprocessing was done automatically using a modied ver-
sion of the Harvard Automated Preprocessing Pipeline (HAPPE: Gabard-
Durnam et al., 2018 ). In line with HAPPE, data was re-referenced to Cz
to obtain symmetrical components in the ICA, high-pass ltered with a
noncausal nite impulse response lter (pass-band: 1 Hz, -6 dB cuto:
0.5 Hz) and electrical line noise (50 Hz) was removed using ZapLine
from NoiseTools ( de Cheveigné, 2020 ). Noisy channels were identied
by assessing the normed joint probability of the average log power from
1 - 125 Hz and rejected if exceeding a threshold of 3 SD from the mean
(mean number of removed channels = 1; range: 0–4). We applied a
wavelet-enhanced ICA ( Castellanos and Makarov, 2006 ) with a thresh-
old of 3 to remove large artifacts, before the data was decomposed with
ICA and artifact-related components were automatically rejected using
MARA ( Winkler et al., 2011 ; mean number of rejected components =
14, range: 7–25). Afterwards, noisy channels were interpolated using
spherical splines and the data was re-referenced to the linked mastoids.
EEG data and the broad-band speech envelope were band-pass l-
tered at the stress and syllable rate. Filter order was optimised through
the Parks-McLellan algorithm ( Parks and McClellan, 1972 ). For the
prosodic stress band, this resulted in a 14572th-order one-pass 1–2.5-Hz
band-pass lter. The phase shift was compensated for by an according
time shift. For the syllabic band, we used an 15883th-order one-pass
lter with pass-frequencies of 4.5 - 6.5 Hz for IDS and 4.15 - 6.15 Hz for
ADS. All data were padded before lter application.
The artifact-corrected EEG data was segmented into continuous trials
corresponding to the annotated maternal speech and combined with the
respective broad-band speech envelope, which had been downsampled
to 500 Hz. The combined data was segmented into 2 second epochs with
50% overlap. Epochs with amplitudes exceeding ±40 𝜇V in any channel
were rejected automatically. On average, infants contributed a total of
112 epochs to the analysis ( 𝑀
𝐼𝐷𝑆
= 57.8, SD = 27.4; 𝑀
𝐴𝐷𝑆
= 54.2, SD
= 32.8). The 23 channels included in the nal analysis were: Fz, F3/4,
F7/8, FC1/2, FC3/4, FT7/8, Cz, C3/4, T7/8, CP3/4, Pz, P3/4, and P7/8.
We removed the outer channels from the nal analysis, since the EEG
signal was consistently noisy across infants.
2.6. Data analysis
2.6.1. Speech-brain coherence
The relationship between speech and brain signal was quantied
using Hilbert coherence over time (see Fig. 1 B). The coherence value
measures the phase-synchronization between the EEG signal and the
corresponding speech envelope, weighted by their relative amplitude.
Coherence is measured on a scale from 0 (random coupling) to 1 (per-
fect synchronization).
Coherence between speech envelope and individual electrodes
in both frequency rates was computed according to the formula:
𝐶𝑜ℎ
𝑥𝑦
( 𝑓) =
|𝑃
𝑥𝑦
( 𝑓)
2
|
𝑃
𝑥𝑥
( 𝑓 ) 𝑃
𝑦𝑦
( 𝑓 )
, where 𝑃
𝑥𝑦
( 𝑓) is the cross-spectral density be-
tween the band-pass ltered speech and EEG signal, and 𝑃
𝑥𝑥
( 𝑓) and
𝑃
𝑦𝑦
( 𝑓) are the auto-spectral density of the speech and EEG signal, re-
spectively.
To analyze whether speech-brain coherence was higher than ex-
pected by chance, the observed coherence values were compared against
surrogate data. Surrogate data was created by randomly pairing the
epoched EEG data with the broad-band speech envelope from a ran-
domly selected epoch from the same or a dierent dyad and applying
a circular shift to the envelope time series ( Keitel et al., 2017 ). This
process was repeated for 10,000 permutations.
2.6.2. Analyses
The observed and permuted coherence values for each infant were
averaged across trials and channels. P-values were derived as the pro-
portion of coherence values in the permutation distribution exceeding
the observed value. To assess dierences between IDS and ADS, we ran
a repeated-measures ANOVA with speech condition (IDS vs. ADS) and
frequency rate (syllabic rate vs. prosodic rate) as within-subjects factors.
3. Results
Maternal speech in the IDS-condition was more prototypically infant-
directed than in the ADS-condition. Speech had a signicantly higher
mean pitch, 𝑡 (29) = 7 . 2 , 𝑝 < . 001 , and pitch range, 𝑡 (29) = 6 . 21 , 𝑝 < . 001 ,
in the IDS-condition compared to the ADS-condition. The amplitude
modulations were signicantly higher for IDS than ADS in the stress rate,
𝑡 (29) = 4 . 1 , 𝑝 < . 001 , but not in the syllable rate, 𝑡 (29) = 0 . 71 , 𝑝 = . 482 .
Table 1 summarizes the descriptive statistics of the acoustic measures.
For further summary statistics of speech content, see supplementary
Table 1 .
The permutation test showed signicant speech-brain coherence for
both the prosodic stress rate, 𝑝 < . 001 , and the syllable rate, 𝑝 < . 001
( Fig. 2 A). The repeated-measures ANOVA showed a signicant main
eect of speech condition, 𝐹 (1 , 29) = 160 . 77 , 𝑝 < . 001 , and no signi-
cant main eect of frequency rate, 𝐹 (1 , 29) = 2 . 43 , 𝑝 = . 13 . Importantly,
we observed a signicant interaction between speech condition and
frequency rate, 𝐹 (1 , 29) = 9 . 14 , 𝑝 = . 005 ( Fig. 2 B). Follow-up t-tests re-
vealed that speech-brain coherence for the stress rate was signicantly
higher in the IDS-condition ( 𝑀
𝐼𝐷𝑆
= 0.492, SD = 0.025) than in the
ADS-condition ( 𝑀
𝐴𝐷𝑆
= 0.475, SD = 0.022), 𝑡 (29) = 3 . 4 , 𝑝 = . 002 . We
found no evidence for a dierence between the IDS-condition ( 𝑀
𝐼𝐷𝑆
=
0.419, SD = 0.02) and the ADS-condition ( 𝑀
𝐴𝐷𝑆
= 0.425, SD = 0.02)
for the syllable rate, 𝑡 (29) = −0 . 99 , 𝑝 = . 33 . Analyses were repeated on
non-normalized data to ensure that the dierence between conditions
did not arise from intensity dierences. The pattern of the results did
not change.
4
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Table 1
Analysis of speech acoustics. Standard deviation in brackets.
Acoustic Measure IDS ADS p-value
Pitch (F0) Mean 238 Hz (28) 214 Hz (19) < .001
Range 247 Hz (62) 188 Hz (49) < .001
Amplitude Modulations (a.u.; 1 ×10
−3
) Stress Rate 2.5 (0.50) 2.1 (0.46) < .001
Syllable Rate 1 (0.14) 0.96 (0.15) .482
Fig. 2. Overview of our results. (A) Coherence values were averaged across all electrodes. Errorbars depict standard errors. Dashed lines indicate 95% signicance
cut-os based on a permutation baseline. Speech-brain coherence was signicantly higher than chance for both IDS and ADS in the two frequency rates. (B) Scalp
topography for the comparison IDS versus ADS. Asteriscs indicate electrodes included in the cluster in the control analysis. For the main analysis, we compared
averages across all electrodes. The dierence between IDS and ADS was signicantly higher in the stress rate than in the syllable rate.
3.1. Control analysis: Ostensive cues
Ostensive cues potentially inuence speech processing (see
Çetinçelik et al., 2020; Csibra and Gergely, 2009 ). In our study,
such cues were primarily present in the IDS-condition. We therefore
conducted additional analyses to control for the possibility that the
tracking dierence between IDS and ADS observed in our study was
based on dierences in ostensive cues, specically focusing on mutual
eye gaze, infant looks to the mother’s face and mentioning the infant’s
name.
In every frame of the video recording, mother’s and infant’s gaze
were coded as looking to the object, to the face of the interaction partner,
to the environment or as non-codeable. The reliability of the codes was
excellent (ICC for mothers = 0.994, ICC for infants = 0.987). Mutual gaze
was dened as periods with simultaneous gaze on the other interaction
partner. We then reanalyzed the data excluding all epochs containing
mutual eye gaze. On average, infants contributed a total of 103 epochs
to the follow-up analysis ( 𝑀
𝐼𝐷𝑆
= 49.4, SD = 23.2; 𝑀
𝐴𝐷𝑆
= 54.1, SD =
32.7). A paired t -test comparing the speech conditions in the stress rate
showed that speech-brain coherence was still signicantly higher for
the IDS-condition ( 𝑀
𝐼𝐷𝑆
= 0.489, SD = 0.023) than the ADS-condition
( 𝑀
𝐴𝐷𝑆
= 0.475, SD = 0.022) after controlling for the eect of mutual
eye gaze, 𝑡 (29) = 2 . 87 , 𝑝 = . 008 . It is, however, possible that infants show
a sustained eect of mutual gaze beyond the epoch. We therefore also
excluded the 5 epochs succeeding mutual eye gaze. This also did not
change the pattern of our results. Note that we were unable to exclude
the whole object description trial in which mutual eye gaze occured, as
this would have left us with too few epochs for a reliable comparison.
In addition, we compared tracking of IDS in the prosodic stress rate
between infants with high mutual gaze to infants with low mutual gaze,
grouped by a median split of the number of epochs containing mutual
gaze. The two groups did not signicantly dier, t (28) = 0.467, p = .64.
To assess the possibility that the IDS advantage for tracking in the
prosodic stress rate was driven by maternal visual cues other than mu-
tual gaze, we excluded all epochs in which the infant looked at the
mother’s face, irrespective of whether there was mutual gaze or not.
On average, infants contributed a total of 90.9 remaining trials to this
follow-up analysis ( 𝑀
𝐼𝐷𝑆
= 45.1, SD = 23.3; 𝑀
𝐴𝐷𝑆
= 45.8, SD = 26.57).
Speech-brain coherence in the prosodic stress rate remained signi-
cantly higher for the IDS-condition ( 𝑀
𝐼𝐷𝑆
= 0.489, SD = 0.026) than
the ADS-condition ( 𝑀
𝐴𝐷𝑆
= 0.472, SD = 0.025) after excluding these
epochs in which infants were looking at their mother’s face, 𝑡 (29) = 3 . 07 ,
𝑝 = . 005 .
Lastly, we assessed whether the amount of calling the infant’s name
in the IDS-condition drove the IDS facilitation in the stress rate. On av-
erage, mothers called their infant’s name 3.9 times in the IDS-condition
( SD = 3.7). We compared tracking in the stress rate between infants who
experienced high calling of their name versus infants who experienced
low calling of their name, which were grouped based on a median split
(median = 3.5). There was no signicant dierence between the two
name-calling groups, t (28) = 0.7, p = .489. Note that we only controlled
for instances in which the infant’s full name or an abbreviation of it
was mentioned, but not for other potentially attention-evoking phrases
that mothers commonly use in IDS. We therefore cannot fully rule out
that the use of such phrases increased attention specically in the IDS
condition.
5
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
3.2. Control analysis: Topography
All EEG analyses reported before were done on coherence values
averaged across the 23 selected electrodes. This approach may hide to-
pography dierences between the IDS- and the ADS-condition in the
two frequencies of interest. To assess this possibility, we conducted
a control analysis on the electrode level, using threshold-free cluster-
enhancement with 10,000 permutations for multiple comparison correc-
tion (height-weight = 2, extend-weight = 0.5; Smith and Nichols, 2009 ).
In line with our earlier results, we found a signicant dierence between
the IDS- and the ADS-condition in the prosodic stress rate ( p < .001), but
not in the syllable rate. The dierence in the stress rate was driven by a
left-central cluster that included electrodes F3, FC3, FC1, C3, CP3, P3,
Cz, FC2, FC4, and CP4. These electrodes are marked by asterisks in the
topography plot in Fig. 2 B.
3.3. Control analysis: Pauses
IDS has been related to an increased number of pauses compared
to ADS ( Martin et al., 2016 ), which may form acoustic edges that can
contribute to speech-brain coherence ( Gross et al., 2013 ). In line with
earlier ndings, the IDS-condition (25 pauses/min, SD = 11.3) had a
higher rate of pauses than the ADS-condition (17.3 pauses/min, SD =
11.1), 𝑡 (29) = 3 . 82 , 𝑝 < . 001 . Pause durations did not dier between the
two conditions ( 𝑀
𝐼𝐷𝑆
= 259 ms, SD = 75; 𝑀
𝐴𝐷𝑆
= 250 ms, SD =
78), 𝑡 (29) = 0 . 63 , 𝑝 = . 536 . To assess whether the increased number of
pauses in IDS contributes to the IDS advantage for tracking, we com-
pared phase-clustering from 1 to 8 Hz (in steps of 0.5 Hz) at word onsets
following pauses and thus forming an acoustic edge to phase-clustering
at word onsets within continuous speech. The analysis assessed phase-
clustering starting -100 ms before word onset until 1 second after in
steps of 10 ms for all electrodes individually, and number of word onsets
contributing to the analysis were matched. Our analysis used cluster-
based permutation for multiple comparison correction and showed no
signicant dierence in phase-clustering between the two types of word
onsets ( p > .1). Next, we compared phase-clustering at pause oset be-
tween the IDS- and the ADS-condition using the same frequencies and
time window. The cluster-based permutation analysis showed no signif-
icant dierence in phase-clustering between the two conditions ( p > .1),
giving no evidence that infants’ neural responses to pauses diered be-
tween IDS and ADS. At last, we compared tracking in the stress rate
between infants with a higher rate of pauses versus infants with a lower
rate of pauses, grouped based on a median split (median = 25.8). The
two groups showed no signicant dierences in tracking, 𝑡 (29) = 0 . 69 ,
𝑝 = 0 . 5 . While this does not exclude the possibility that pauses and as-
sociated acoustic edges increase speech-brain coherence, we nd no ev-
idence that they are the main driver of the IDS facilitation for tracking
in the stress rate.
4. Discussion
The present study set out to investigate infants’ neural tracking of
natural IDS compared to ADS and to delineate whether the IDS facili-
tation is driven by prosodic stress. We observed signicant tracking of
speech at both the stress and the syllable rate during natural interac-
tions of 9-month-olds with their mothers. Adding to previous ndings,
we report here that tracking is facilitated by IDS and that this eect is
specic to the prosodic stress rate. This suggests that the IDS advantage
for infants’ tracking is specically based on enhanced prosodic stress
and not on the syllable rhythm. Our nding emphasizes the important
role of IDS for infants’ speech processing and possibly their language
development.
At the age of 9 months, infants have started to segment words from
continuous speech ( Junge et al., 2014; Jusczyk et al., 1999; Männel
and Friederici, 2013 ), facilitated by IDS ( Schreiner and Mani, 2017 ).
Speech segmentation is crucial for the acquisition of higher-level lin-
guistic meaning and better word segmentation in infancy was shown
to predict later vocabulary size ( Junge et al., 2012 ) and syntactic skills
( Kooijman et al., 2013 ). Since continuous speech contains no pauses be-
tween words, infants must rely on other cues to detect word boundaries.
In stress-based languages like English or German, stressed syllables can
provide a valuable cue for segmenting words from continuous speech
( Jusczyk et al., 1999 ), as the majority of content words in these lan-
guages have word-initial stress ( Cutler and Carter, 1987; Stärk et al.,
2021 ). Our study shows that that not only do mothers enhance their am-
plitude modulations at the prosodic stress rate in IDS, but also infants
do track this enhancement. This suggests that tracking might facilitate
higher-level inferential processes such as word segmentation.
Because of the way this study was set-up, the IDS-condition included
a number of additional ostensive cues that were not present in the
ADS-condition. Most relevant are the addition of mutual gaze between
mother and infant and calling of the infant’s name, as mothers were
specically told to focus on these cues. In addition, it is possible that
mothers increased other visual cues in the IDS-condition, as adults were
shown to exaggerate facial expressions such as lip and head movements
when addressing children ( Green et al., 2010; Smith and Strader, 2014;
Swerts and Krahmer, 2010 ), which we were unable to assess in the cur-
rent study. These ostensive cues are special as they help guiding infants’
attention to maternal speech ( Csibra and Gergely, 2006; 2009 ) and con-
sequently may have assisted to increase infants’ speech processing (for
a review, see Çetinçelik et al., 2020 ). However, we nd that the IDS-
condition specically facilitated tracking in the prosodic stress rate and
no evidence for an IDS facilitation in the syllable rate. This nding is
not compatible with a general increase of attention to maternal speech
by ostensive cues in the IDS-condition. In addition, our control analy-
sis showed that the IDS benet for tracking persists even after we ex-
cluded epochs with mutual eye gaze and that infants who experienced
more calling of their name did not show a higher tracking of IDS in the
prosodic stress rate than infants who experienced less calling of their
name. These results do not imply that visual information is irrelevant
for speech processing. Previous studies have shown that visual informa-
tion increases tracking of speech in adults ( Bourguignon et al., 2020;
Crosse et al., 2015 ) and likely also in children ( Power et al., 2012 ). As
our design does not allow to investigate whether the frequency of visual
exaggerations in the IDS-condition coincides with the prosodic stress
rate, we conducted a control analysis excluding all epochs during which
the infant looked at the mother. Even for the parts of the interactions in
which the infants did not look at the mother, the IDS tracking advantage
in the prosodic stress rate persisted. This supports our conclusion that
the IDS benet for speech processing results from its acoustic properties,
even though we cannot fully exclude the possibility that infants still per-
ceived some exaggerated visual cues even if they did not directly look
at the mother’s face. Further studies are needed to dissociate the unique
contributions of visual and acoustic cues to infants’ neural processing of
IDS.
Regarding parental acoustic speech modulations, the enhanced am-
plitude modulation in the slow stress rate could assist infants’ tracking
of speech by increasing rhythmic cues. Natural speech is not perfectly
regular. This lack of clear rhythm is a challenge for the synchronization
between neural activity and speech input. In adults, linguistic knowl-
edge can compensate for the lack of rhythm by top-down modulating
auditory activity via linguistic predictions ( Keitel et al., 2017; Meyer
et al., 2019; Rimmele et al., 2018; Ten Oever and Martin, 2021 ). Yet,
preverbal infants still lack the linguistic knowledge required for such
predictions. The enhancement of slow amplitude modulations in IDS
could compensate for this lack by providing additional acoustic cues
which aids tracking for the prosodic stress rate. A second possibility is
that IDS modulates tracking by increasing infants’ attention, possibly
via a combination of visual and acoustic cues. The typical acoustic cor-
relates of IDS were shown to increase infants’ attention compared to
ADS ( ManyBabies Consortium, 2020; Cooper and Aslin, 1990; Kaplan
6
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
et al., 1995; Roberts et al., 2013 ). Neural tracking is aected by attention
( Fuglsang et al., 2017 ) and reects the selection of relevant attended in-
formation ( Obleser and Kayser, 2019 ). Increased tracking of IDS in the
prosodic stress rate may thus reect 9-month-olds’ enhanced attention
to prosodic stress, which provides them with a relevant acoustic cue
aiding word segmentation. These two interpretations are not mutually
exclusive but may explain our ndings as a combination of enhanced
acoustic cues in maternal speech and increased attention of the infant
for prosodic stress in IDS.
One question that we cannot account for is whether the enhanced
synchronization between neural activity and IDS observed here results
from genuine entrainment of endogenous oscillations or from auditory-
evoked reponses (see Keitel et al., 2021 ). It has been suggested that
oscillations in the auditory cortex phase-lock to acoustic information in
a frequency specic manner ( Lakatos et al., 2013 ). In speech process-
ing, F0 amplitude rhythms might entrain neural oscillations in the delta
frequency ( Bourguignon et al., 2013 ). For our current results, this could
indicate that the amplitude edges or peaks in the prosodic stress rate of
IDS provide sucient rhythmic cues to allow for a phase-alignment of
oscillatory activity operating in the frequency range of prosodic stress.
Another possibility is that the exaggeration of prosodic stress in IDS
leads to a series of evoked responses that are superimposed on neural
activity and thus appear in the same frequency band as the prosodic
stress rate. Our results are compatible with both explanations, therefore
future work is required to distinguish these two accounts for infants’
processing of IDS. Since both possbilities result in increased neural pro-
cessing of acoustic information in the prosodic stress rate in IDS, they
are also both compatible with our interpretation that tracking facilitates
infants’ word segmentation from continuous IDS.
Our study provides further evidence for the previously proposed im-
portance of prosody in assisting speech processing. This is especially
relevant in light of healthy parent-infant interactions given evidence
that clinically depressed mothers show less IDS, potentially impacting
children’s language development ( Lam-Cassettari and Kohlho, 2020;
Liu et al., 2017; Stein et al., 2008 ). In healthy parent-infant interac-
tions, IDS may be optimally adapted to infants’ needs during language
development (see Kalashnikova and Burnham, 2018 ). As infants grow
older, the amount of parents’ IDS decreases and changes its acous-
tic characteristics ( Kitamura and Burnham, 2003; Raneri et al., 2020 ).
Leong et al. (2017) showed that the enhancement of prosodic amplitude
modulations in IDS decreases when mothers are talking to older infants.
These changes in IDS may be tied to infants’ increased linguistic knowl-
edge, as parents were shown to use more prototypically infant-directed
speech when talking to infants with lower language abilities ( Bohannon
and Marquis, 1977; Kalashnikova et al., 2020; Reissland and Stephen-
son, 1999 ). Importantly, speech tracking was shown to increase with
linguistic knowledge ( Chen et al., 2020; Choi et al., 2020 ), meaning that
infants’ tracking may rely less on acoustic cues in IDS as their linguistic
knowledge increases. This implies that parents adapt the acoustic prop-
erties of their speech to their infants’ language development to allow
for a level of tracking that is optimal for the infants’ current language
status. Future studies need to evaluate the interactions between parents’
speech adaptations and infants’ linguistic knowledge on infants’ track-
ing of speech. The current study contributes an empirical foundation for
such future investigations, by showing that neural tracking is sensitive
to parents’ speech adaptations during natural interactions, likely facili-
tating higher-level inferential processes such as word segmentation. This
makes tracking a potential neural mechanism for infants’ word segmen-
tation from continuous speech.
Data and Code Availability Statement
Data availability
The conditions of our ethics approval do not permit public archiving
of participant data. Readers seeking access to the data should contact
the corresponding author to arrange a formal data sharing agreement.
Code availability
Preprocessing of the EEG data was done using the publicly
available HAPPE pipeline V1 (DOI: 10.3389/fnins.2018.00097;
download: https://github.com/lcnhappe/happe ) in EEGLAB
v2019.1 (DOI: https://doi.org/10.1515/bmt-2013-4182 ; download:
https://sccn.ucsd.edu/eeglab/download.php ) and in eldtrip (ver-
sion from 20200521) (DOI: https://doi.org/10.1155/2011/156869 ;
download: https://www.eldtriptoolbox.org/download.php ). Custom
code was written for the computation of speech envelopes and Hilbert
coherence and will be made available if the article is accepted for
publication.
Declaration of Competing Interest
The authors declare that there is no conict of interest.
4.1. Funding
This work was supported by the Max Planck Society. The funders
had no role in the conceptualization, design, data collection, analysis,
decision to publish, or preparation of the manuscript.
Credit authorship contribution statement
Katharina H. Menn: Conceptualization, Formal analysis, Visualiza-
tion, Writing original draft. Christine Michel: Conceptualization, In-
vestigation, Data curation, Writing –review & editing. Lars Meyer:
Conceptualization, Formal analysis, Writing original draft, Supervi-
sion. Stefanie Hoehl: Conceptualization, Resources, Writing –review
& editing. Claudia Männel: Conceptualization, Supervision, Writing
original draft.
Acknowledgments
We are grateful to the infants and parents who participated. We
thank Milena Marx, Katja Höhne, Ole Scholand, Johanna Bayón,
Leonie Grandpierre, Melanie Schwan, Annika Behlen and Ann Sophie
von Schwartzenberg for their assistance with data collection, Claudia
Geißler and Sophia Richter for their help with the speech processing,
Christina Münchberger, Luka Büttner and Florian Teichmann for coding
the eye gaze and Johanna Lieb for her assistance with the data prepara-
tion.
Supplementary material
Supplementary material associated with this article can be found, in
the online version, at doi: 10.1016/j.neuroimage.2022.118991
References
Adriaans, F. , Swingley, D. , 2017. Prosodic exaggeration within infant-directed speech:
consequences for vowel learnability. J. Acoust. Soc. Am. 141 (5), 3070–3078 .
Attaheri, A., Choisdealbha, Á.N., Di Liberto, G.M., Rocha, S., Brusini, P., Mead, N.,
Olawole-Scott, H., Boutris, P., Gibbon, S., Williams, I., et al., 2022. Delta-and theta-
band cortical tracking and phase-amplitude coupling to sung speech by infants. Neu-
roimage 118698. doi: 10.1016/j.neuroimage.2021.118698 .
Begus, K. , Southgate, V. , Gliga, T. , 2015. Neural mechanisms of infant learning: dier-
ences in frontal theta activity during object exploration modulate subsequent object
recognition. Biol. Lett. 11 (5), 20150041 .
Boersma, P., 2001. Praat, a system for doing phonetics by computer. Glot International 5
(9), 341–345. https://hdl.handle.net/11245/1.200596 .
Bohannon, I., Marquis, A.L., 1977. Children’S control of adult speech. Child Dev 1002–
1008. doi: 10.2307/1128352 .
Bourguignon, M., Baart, M., Kapnoula, E.C., Molinaro, N., 2020. Lip-reading enables the
brain to synthesize auditory
features of unknown silent speech. J. Neurosci. 40 (5),
1053–1065. doi: 10.1523/JNEUROSCI.1101-19.2019 .
Bourguignon, M., De Tiege, X., De Beeck, M.O., Ligot, N., Paquier, P., Van Bogaert, P.,
Goldman, S., Hari, R., Jousmäki, V., 2013. The pace of prosodic phrasing cou-
ples the listener’s cortex to the reader’s voice. Hum Brain Mapp 34 (2), 314–326.
doi: 10.1002/hbm.21442 .
7
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Castellanos, N.P., Makarov, V.A., 2006. Recovering eeg brain signals: artifact suppression
with wavelet enhanced independent component analysis. J. Neurosci. Methods 158
(2), 300–312. doi: 10.1016/j.jneumeth.2006.05.033 .
Çetinçelik, M., Rowland, C.F., Snijders, T.M., 2020. Do the eyes have it? a system-
atic review on the role of eye gaze in infant language development. Front Psychol
doi: 10.3389/fpsyg.2020.589096 .
Chandrasekaran, C. , Trubanova, A. , Stillittano, S. , Caplier, A. , Ghazanfar, A.A. , 2009. The
natural statistics of audiovisual speech. PLoS Comput. Biol. 5 (7) .
Chen, Y., Jin, P., Ding, N., 2020. The inuence of linguistic informa-
tion on cortical tracking of words. Neuropsychologia 148, 107640.
doi: 10.1016/j.neuropsychologia.2020.107640 .
de Cheveigné, A., 2020. Zapline: a simple and eective method to remove power line
artifacts. Neuroimage 207, 116356. doi: 10.1016/j.neuroimage.2019.116356 .
Choi, D., Batterink, L.J., Black, A.K., Paller, K.A., Werker, J.F., 2020. Preverbal infants
discover statistical word patterns at similar rates as adults: evidence from neural en-
trainment. Psychol Sci 31 (9), 1161–1173. doi: 10.1177/0956797620933237 .
ManyBabies Consortium, 2020. Quantifying sources of variability in infancy research us-
ing the infant-directed-Speech preference. Advances in Methods and Practices in Psy-
chological Science 3 (1), 24–52. doi: 10.1177/2515245919900809 .
Cooper, R.P., Aslin, R.N., 1990. Preference for
infant-directed speech in the rst month
after birth. Child Dev 61 (5), 1584–1595. doi: 10.1111/j.1467-8624.1990.tb02885.x .
Cristia, A., 2013. Input to language: the phonetics and perception of infant-directed
speech. Linguistics and Language Compass 7 (3), 157–170. doi: 10.1111/lnc3.12015 .
Crosse, M.J., Butler, J.S., Lalor, E.C., 2015. Congruent visual speech enhances cortical
entrainment to continuous auditory speech in noise-free conditions. J. Neurosci. 35
(42), 14195–14204. doi: 10.1523/JNEUROSCI.1829-15.2015 .
Csibra, G. , Gergely, G. , 2006. Social learning and social cognition: the case for pedagogy.
Processes of change in brain and cognitive development. Attention and performance
XXI 21, 249–274 .
Csibra, G., Gergely,
G., 2009. Natural pedagogy. Trends Cogn. Sci. (Regul. Ed.) 13 (4),
148–153. doi: 10.1016/j.tics.2009.01.005 .
Cutler, A., Carter, D.M., 1987. The predominance of strong initial syllables in
the english vocabulary. Computer Speech & Language 2 (3–4), 133–142.
doi: 10.1016/0885-2308(87)90004-0 .
Delorme, A., Makeig, S., 2004. Eeglab: an open source toolbox for analysis of single-trial
eeg dynamics including independent component analysis. J. Neurosci. Methods 134
(1), 9–21. doi: 10.1016/j.jneumeth.2003.10.009 .
Doelling, K.B., Arnal, L.H., Ghitza, O., Poeppel, D., 2014. Acoustic landmarks drive delta–
theta oscillations to enable speech comprehension by facilitating perceptual parsing.
Neuroimage 85, 761–768. doi: 10.1016/j.neuroimage.2013.06.035 .
Fernald, A., Simon, T., 1984. Expanded intonation contours in mothers’ speech to new-
borns. Dev Psychol 20 (1), 104. doi: 10.1037/0012-1649.20.1.104 .
Fernald, A., Taeschner, T., Dunn, J., Papousek, M., de Boysson-Bardies, B., Fukui, I.,
1989. A cross-language study of prosodic modications in mothers’ and fa-
thers’ speech to preverbal infants. J Child Lang 16 (3), 477–501. doi: 10.1017/
S0305000900010679 .
Fuglsang, S.A., Dau, T., Hjortkjær, J., 2017. Noise-robust cortical tracking of
attended speech in real-world acoustic scenes. Neuroimage 156, 435–444.
doi: 10.1016/j.neuroimage.2017.04.026 .
Gabard-Durnam, L.J., Mendez Leal, A.S., Wilkinson, C.L., Levin, A.R., 2018. The harvard
automated processing pipeline for electroencephalography (happe): standardized pro-
cessing software for developmental and high-artifact data. Front Neurosci 12, 97.
doi: 10.3389/fnins.2018.00097 .
Goswami, U., 2019. Speech rhythm and language acquisition: an amplitude mod-
ulation phase hierarchy perspective. Ann. N. Y. Acad. Sci. 1453, 67–78.
doi: 10.1111/nyas.14137 .
Graf Estes, K., Hurley, K., 2013. Infant-directed prosody helps infants map sounds
to mean-
ings. Infancy 18 (5), 797–824. doi: 10.1111/infa.12006 .
Green, J.R., Nip, I.S.B., Wilson, E.M., Meerd, A.S., Yunusova, Y., 2010. Lip movement ex-
aggerations during infant-directed speech. Journal of Speech, Language, and Hearing
Research 53 (6), 1529–1542. doi: 10.1044/1092-4388(2010/09-0005) .
Grieser, D.L., Kuhl, P.K., 1988. Maternal speech to infants in a tonal language:
support for universal prosodic features in motherese. Dev Psychol 24 (1), 14.
doi: 10.1037/0012-1649.24.1.14 .
Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., Garrod, S., 2013.
Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS
Biol. 11 (12). doi: 10.1371/journal.pbio.1001752 .
Háden, G.P., Mády, K., Török, M., Winkler, I., 2020. Newborn infants dierently process
adult directed and infant directed speech. International Journal of Psychophysiology
147, 107–112. doi: 10.1016/j.ijpsycho.2019.10.011 .
Hoehl, S. , Michel, C. , Reid, V.M. , Parise, E. , Striano, T. , 2014. Eye contact during live social
interaction modulates infants’ oscillatory brain activity. Soc Neurosci 9 (3), 300–308 .
Jessen, S., Fiedler, L., Münte, T.F., Obleser, J., 2019. Quantifying the individual auditory
and visual brain response in 7-month-old infants watching a brief cartoon movie.
Neuroimage 202, 116060. doi: 10.1016/j.neuroimage.2019.116060 .
Jessen, S., Obleser, J., Tune, S., 2021. Neural tracking in infants–an analytical tool
for multisensory social processing in development. Dev Cogn Neurosci 101034.
doi: 10.1016/j.dcn.2021.101034 .
Junge, C. , Cutler, A. , Hagoort, P. , 2014. Successful word recognition by 10-month-olds
given continuous speech both at initial exposure and test. Infancy 19 (2), 179–193 .
Junge, C., Kooijman, V., Hagoort, P., Cutler, A., 2012. Rapid recognition at 10
months as a predictor of language development. Dev Sci 15 (4), 463–473.
doi: 10.1111/j.1467-7687.2012.1144.x .
Jusczyk, P.W., Houston, D.M., Newsome, M., 1999. The beginnings of word
segmentation in english-learning infants. Cogn Psychol 39 (3–4), 159–207.
doi: 10.1006/cogp.1999.0716 .
Kalashnikova, M., Burnham, D., 2018. Infant-directed speech from seven to nineteen
months has similar acoustic properties but dierent functions. J Child Lang 45 (5),
1035–1053. doi: 10.1017/S0305000917000629 .
Kalashnikova, M., Goswami, U., Burnham, D., 2020. Infant-directed speech to in-
fants at risk for dyslexia: a novel cross-dyad design. Infancy 25 (3), 286–303.
doi: 10.1111/infa.12329 .
Kalashnikova, M., Peter, V., Di Liberto, G.M., Lalor, E.C., Burnham, D., 2018. Infant-
directed speech facilitates seven-month-old infants’ cortical tracking of speech. Sci
Rep 8 (1), 1–8. doi: 10.1038/s41598-018-32150-6 .
Kaplan, P.S., Goldstein, M.H., Huckeby, E.R., Owren, M.J., Cooper, R.P., 1995. Dishabit-
uation of visual attention
by infant-versus adult-directed speech: eects of frequency
modulation and spectral composition. Infant Behavior and Development 18 (2), 209–
223. doi: 10.1016/0163-6383(95)90050-0 .
Katz, G.S., Cohn, J.F., Moore, C.A., 1996. A combination of vocal f0 dynamic and summary
features discriminates between three pragmatic categories of infant-directed speech.
Child Dev 67 (1), 205–217.
doi: 10.1111/j.1467-8624.1996.tb01729.x .
Keitel, A., Ince, R.A., Gross, J., Kayser, C., 2017. Auditory cortical delta-entrainment in-
teracts with oscillatory power in multiple fronto-parietal networks. Neuroimage 147,
32–42. doi: 10.1016/j.neuroimage.2016.11.062 .
Keitel, C. , Obleser, J. , Jessen, S. , Henry, M.J. , 2021. Frequency-specic eects in infant
electroencephalograms do not require entrained neural oscillations: acommentary on
köster et al.(2019). Psychol Sci . 09567976211001317
Kitamura, C., Burnham, D., 2003. Pitch and communicative intent in mother’s
speech: adjustments for age and sex in the rst year. Infancy 4 (1), 85–110.
doi: 10.1207/S15327078IN0401_5 .
Kooijman, V. , Hagoort, P. , Cutler, A. , 2009.
Prosodic structure in early word segmentation:
ERP evidence from dutch ten-month-olds. Infancy 14 (6), 591–612 .
Kooijman, V. , Junge, C. , Johnson, E.K. , Hagoort, P. , Cutler, A. , 2013. Predictive brain
signals of linguistic development. Front Psychol 4, 25 .
Lakatos, P., Musacchia, G., O’Connel, M.N., Falchier, A.Y., Javitt, D.C.,
Schroeder, C.E., 2013. The spectrotemporal lter mechanism of auditory se-
lective attention. Neuron 77 (4), 750–761. doi: 10.1016/j.neuron.2012.11.034 .
https://www.sciencedirect.com/science/article/pii/S0896627312011270
Lam-Cassettari, C. , Kohlho, J. , 2020. Eect of maternal depression on infant-directed
speech to prelinguistic infants: implications for language development. PLoS ONE 15
(7), e0236787 .
Leong,
V., Goswami, U., 2015. Acoustic-emergent phonology in the amplitude envelope of
child-directed speech. PLoS ONE 10 (12), 1–37. doi: 10.1371/journal.pone.0144411 .
Leong, V., Kalashnikova, M., Burnham, D., Goswami, U., 2017. The temporal
modulation mtructure of infant-Directed speech. Open Mind 1 (2), 78–90.
doi: 10.1162/OPMI_a_00008 .
Liu, Y. , Kaaya, S. , Chai, J. , McCoy, D. , Surkan, P. , Black, M. , Sutter-Dallay, A.-L. , Ver-
doux, H. , Smith-Fawzi, M. , 2017. Maternal depressive symptoms and early childhood
cognitive development: a meta-analysis. Psychol Med 47 (4), 680–689 .
Lloyd-Fox, S. , Széplaki-Köll ő d, B. , Yin, J. ,
Csibra, G. , 2015. Are you talking to me? neu-
ral activations in 6-month-old infants in response to being addressed during natural
interactions. Cortex 70, 35–48 .
Männel, C., Friederici, A.D., 2013. Accentuate or repeat? brain signatures of de-
velopmental periods in infant word recognition. Cortex 49 (10), 2788–2798.
doi: 10.1016/j.cortex.2013.09.003 .
Martin, A., Igarashi, Y., Jincho, N., Mazuka, R., 2016. Utterances in infant-directed
speech are shorter, not slower. Cognition 156, 52–59. doi: 10.1016/j.cognition.2016.
07.015 .
Meyer, L., 2018. The neural oscillations of speech processing and language comprehen-
sion: state of the art and emerging mechanisms. European Journal of Neuroscience
48
(7), 2609–2621. doi: 10.1111/ejn.13748 .
Meyer, L., Sun, Y., Martin, A.E., 2019. Synchronous, but not entrained: exogenous and
endogenous cortical rhythms of speech and language processing. Lang Cogn Neurosci
0 (0), 1–11. doi: 10.1080/23273798.2019.1693050 .
Michel, C., Matthes, D., Hoehl, S., 2021. Neural and behavioral correlates of ostensive
cues in naturalistic mother-infant interactions. Manuscript in preparation.
Obleser, J., Kayser, C., 2019. Neural entrainment and attentional selection
in the listening brain. Trends Cogn. Sci. (Regul. Ed.) 23 (11), 913–926.
doi: 10.1016/j.tics.2019.08.004 .
Oostenveld, R., Fries, P., Maris, E., Schoelen, J.-M., 2011. Fieldtrip: open source software
for advanced analysis of meg, eeg, and
invasive electrophysiological data. Comput
Intell Neurosci 2011. doi: 10.1155/2011/156869 .
Ortiz Barajas, M.C., Guevara, R., Gervain, J., 2021. The origins and development of speech
envelope tracking during the rst months of life. Dev Cogn Neurosci 48, 100915.
doi: 10.1016/j.dcn.2021.100915 .
Parks, T. , McClellan, J. , 1972. Chebyshev approximation for nonrecursive digital lters
with linear phase. IEEE Transactions on Circuit Theory 19 (2), 189–194 .
Peelle, J.E., Gross, J., Davis, M.H., 2013. Phase-locked responses to speech in human audi-
tory cortex are enhanced during comprehension. Cerebral Cortex 23 (6), 1378–1387.
doi: 10.1093/cercor/bhs118 .
Power, A.J. , Mead, N. , Barnes, L.
, Goswami, U. , 2012. Neural entrainment to rhythmically
presented auditory, visual, and audio-visual speech in children. Front Psychol 3, 216 .
Ramírez-Esparza, N., García-Sierra, A., Kuhl, P.K., 2014. Look who’s talking: speech style
and social context in language input to infants are linked to concurrent and future
speech development. Dev Sci 17 (6), 880–891. doi: 10.1111/desc.12172 .
Raneri, D., Von Holzen, K., Newman, R., Bernstein Ratner, N., 2020. Change in maternal
speech rate to preverbal infants over the rst two years of life. J Child Lang 47 (6),
1263–1275. doi: 10.1017/S030500091900093X .
Reissland, N., Stephenson, T., 1999. Turn-taking in early
vocal interaction: a comparison
of premature and term infants’ vocal interaction with their mothers. Child Care Health
Dev 25 (6), 447–456. doi: 10.1046/j.1365-2214.1999.00109.x .
8
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Rimmele, J.M., Morillon, B., Poeppel, D., Arnal, L.H., 2018. Proactive sensing of periodic
and aperiodic auditory patterns. Trends Cogn. Sci. (Regul. Ed.) 22 (10), 870–882.
doi: 10.1016/j.tics.2018.08.003 .
Roberts, S., Fyeld, R., Baibazarova, E., van Goozen, S., Culling, J.F., Hay, D.F., 2013.
Parental speech at 6 months predicts joint attention at 12 months. Infancy 18, E1–
E15. doi: 10.1111/infa.12018 .
Schreiner, M.S., Mani, N., 2017. Listen up! developmental dierences in
the impact of IDS on speech segmentation. Cognition 160, 98–102.
doi: 10.1016/j.cognition.2016.12.003 .
Singh, L., Nestor, S., Parikh, C., Yull, A., 2009. Inuences of infant-directed speech on
early word recognition. Infancy 14 (6), 654–666. doi: 10.1080/15250000903263973 .
Smith, N.A. , Strader, H.L. , 2014. Infant-directed visual prosody: Mothers’ head movements
and speech acoustics. g 15 (1), 38–54 .
Smith, S.M., Nichols, T.E., 2009. Threshold-free cluster enhancement: address-
ing problems of smoothing, threshold dependence and localisation in clus-
ter inference. Neuroimage 44 (1),
83–98. doi: 10.1016/j.neuroimage.2008.03.061 .
https://www.sciencedirect.com/science/article/pii/S1053811908002978 .
Smith, Z.M., Delgutte, B., Oxenham, A.J., 2002. Chimaeric sounds re-
veal dichotomies in auditory perception. Nature 416 (6876), 87
–90. doi: 10.1038/416087a .
Soderstrom, M., 2007. Beyond babytalk: re-evaluating the nature and content of
speech input to preverbal infants. Developmental Review 27 (4), 501–532.
doi: 10.1016/j.dr.2007.06.002 .
Soderstrom, M., Blossom, M., Foygel, R., Morgan, J.L., 2008. Acoustical cues and gram-
matical units in speech to two preverbal infants. J Child Lang 35 (4), 869–902.
doi: 10.1017/S0305000908008763 .
Spinelli, M., Fasolo, M., Mesman, J., 2017. Does prosody make the dierence? a meta-
analysis on relations between prosodic aspects of infant-directed speech and infant
outcomes. Developmental Review 44, 1–18. doi: 10.1016/j.dr.2016.12.001 .
Stärk, K., Kidd, E., Frost, R.L., 2021. Word segmentation cues in German child-directed
speech: a corpus analysis. Lang Speech (February) doi: 10.1177/0023830920979016 .
Stein, A. , Malmberg, L.-E. , Sylva, K. , Barnes, J. , Leach, P. , team, F. , 2008. The inuence of
maternal depression, caregiving, and socioeconomic status in the post-natal year on
children’s language development. Child Care Health Dev 34 (5), 603–612 .
Swerts,
M. , Krahmer, E. , 2010. Visual prosody of newsreaders: eects of information struc-
ture, emotional content and intended audience on facial expressions. J Phon 38 (2),
197–206 .
Ten Oever, S. , Martin, A.E. , 2021. An oscillating computational model can track pseu-
do-rhythmic speech by using linguistic predictions. eLife 10, e68066 .
Thiessen, E.D., Hill, E.A., Saran, J.R., 2005. Infant-directed speech facilitates word seg-
mentation. Infancy 7 (1), 53–71. doi: 10.1207/s15327078in0701_5 .
Weisleder, A., Fernald, A., 2013. Talking to children matters: early language experi-
ence strengthens processing and builds vocabulary. Psychol Sci 24 (11), 2143–2152.
doi: 10.1177/0956797613488145 .
Winkler,
I., Haufe, S., Tangermann, M., 2011. Automatic classication of artifactual ica-
components for artifact removal in eeg signals. Behavioral and Brain Functions 7 (1),
1–15. doi: 10.1186/1744-9081-7-30 .
Zangl, R. , Mills, D.L. , 2007. Increased brain activity to infant-directed speech in 6-and
13-month-old infants. Infancy 11 (1), 31–62 .
9
... Since early infancy, infants have shown a preference for IDS over ADS (Dunst et al. 2012;ManyBabies Consortium 2020). IDS facilitates cortical tracking of speech envelope in infants 7 months of age (Kalashnikova et al. 2018), prosodic stress in infants 9 months of age (Menn et al. 2022), formant exaggeration (Zhang et al. 2011), and word learning (Zhou et al. 2023), compared to ADS. What remains unknown is how mothers' uses of ADS and IDS modulate father-infant IBC. ...
... Most infants 7-9 months of age prefer their mothers' IDS, which typically has higher pitch profiles (e.g., Figure 3) and enhanced positive affect during their interaction (Benders 2013), compared to ADS (see review by Dunst et al. 2012). Increased and joint attention to maternal storytelling in IDS (Gvirts and Perlmutter 2020) could better align infants and strangers through greater cortical responses in the frontotemporal areas (Naoi et al. 2012;Saito et al. 2007;Zhou et al. 2023) and improved neural tracking of speech (Kalashnikova et al. 2018;Leong et al. 2017;Menn et al. 2022;Zhang et al. 2011), compared to ADS. Further, our results also found greater IBC in the IDS condition between the FC and TP on the left hemisphere from ΔHbO measures, and the FC and TP on the right hemisphere from ΔHbR measures of infants and all the ROIs of strangers, and vice versa, compared to the ADS condition. ...
Article
Full-text available
The current study examined the inter-brain coherence (IBC) between 34 dyads of fathers and infants 7-9 months of age using functional near-infrared spectroscopy (fNIRS). We specifically focused on father-infant IBC to broaden the empirical base beyond the mother-infant connections, as the former has received limited attention. There were three conditions: a baseline condition and two task conditions when the infant and the adult participant jointly listened to maternal storytelling in Cantonese in infant-directed speech (IDS) and adult-directed speech (ADS). Father-infant IBC was compared with stranger-infant IBC in the same experimental settings. Our results found that father-infant IBC was greater in the baseline and ADS conditions but not in the IDS condition, compared to stranger-infant IBC. Further, stranger-infant dyads showed greater IBC in the IDS condition than in the ADS condition, with no significance in father-infant IBC between the two speech conditions. These results identified different inter-brain connection mechanisms between the two dyads. The IBC pattern in stranger-infant dyads is driven by neural entrainment to mothers' speech, whereas father-infant IBC is more resistant to mothers' behaviours in the co-presence of both parents.
... This interpretation is in line with Bertels et al., 53 who found that delta (phrasal) speech-brain coherence was comparable to adults from 5 years of age, while theta matured later between 7.5 and 10 years of age. However, in another recent study, Menn et al. 54 did find significant speech-brain coherence at the syllable rate in infants as young as 9 months of age. Task requirement differences could explain these divergent findings between Menn et al. 54 and our study at the theta frequency band. ...
... However, in another recent study, Menn et al. 54 did find significant speech-brain coherence at the syllable rate in infants as young as 9 months of age. Task requirement differences could explain these divergent findings between Menn et al. 54 and our study at the theta frequency band. In Menn and collaborators' study, infants' CTS was measured while they were listening to their mothers in relatively short (40 s) live interactions (audiovisual stimuli) which could have boosted infants' attention during those transient interactions, whereas in our study, CTS metrics were extracted from considerably longer 15-min continuous stories with static images (very limited visual information). ...
Article
Full-text available
Cortical tracking of speech is relevant for the development of speech perception skills. However, no study to date has explored whether and how cortical tracking of speech is shaped by accumulated language experience, the central question of this study. In 35 bilingual children (6-year-old) with considerably bigger experience in one language, we collected electroencephalography data while they listened to continuous speech in their two languages. Cortical tracking of speech was assessed at acoustic-temporal and lexico-semantic levels. Children showed more robust acoustic-temporal tracking in the least experienced language, and more sensitive cortical tracking of semantic information in the most experienced language. Additionally, and only for the most experienced language, acoustic-temporal tracking was specifically linked to phonological abilities, and lexico-semantic tracking to vocabulary knowledge. Our results indicate that accumulated linguistic experience is a relevant maturational factor for the cortical tracking of speech at different levels during early language acquisition.
... These neural data from children suggest that cortical tracking within specific low-frequency bands (i.e. delta, 0.5-4 Hz and theta, 4-8 Hz) and their interactive dynamics may be central to language acquisition by infants [13,37,38]. ...
Article
Full-text available
Cortical signals have been shown to track acoustic and linguistic properties of continuous speech. This phenomenon has been measured in both children and adults, reflecting speech understanding by adults as well as cognitive functions such as attention and prediction. Furthermore, atypical low-frequency cortical tracking of speech is found in children with phonological difficulties (developmental dyslexia). Accordingly, low-frequency cortical signals may play a critical role in language acquisition. A recent investigation with infants Attaheri et al., 2022 [1] probed cortical tracking mechanisms at the ages of 4, 7 and 11 months as participants listened to sung speech. Results from temporal response function (TRF), phase-amplitude coupling (PAC) and dynamic theta-delta power (PSD) analyses indicated speech envelope tracking and stimulus-related power (PSD) for delta and theta neural signals. Furthermore, delta- and theta-driven PAC was found at all ages, with theta phases displaying stronger PAC with high-frequency amplitudes than delta. The present study tests whether these previous findings replicate in the second half of the full cohort of infants (N = 122) who were participating in this longitudinal study (first half: N = 61, (1); second half: N = 61). In addition to demonstrating good replication, we investigate whether cortical tracking in the first year of life predicts later language acquisition for the full cohort (122 infants recruited, 113 retained) using both infant-led and parent-estimated measures and multivariate and univariate analyses. Increased delta cortical tracking in the univariate analyses, increased ~2Hz PSD power and stronger theta-gamma PAC in both multivariate and univariate analyses were related to better language outcomes using both infant-led and parent-estimated measures. By contrast, increased ~4Hz PSD power in the multi-variate analyses, increased delta-beta PAC and a higher theta/delta power ratio in the multi-variate analyses were related to worse language outcomes. The data are interpreted within a “Temporal Sampling” framework for developmental language trajectories.
... It has been suggested that CTS is an important part of speech processing because it helps separate and decode continuous speech signals into linguistic units at different timescales (Ahissar et al., 2001;Giraud and Poeppel, 2012;Peelle and Davis, 2012;Peelle et al., 2013;Zoefel and VanRullen, 2015;Ding et al., 2016;Keitel, Gross, and Kayser, 2018;Kosem et al., 2018;Meyer and Gumbert, 2018;Lizarazu, Carreiras and Molinaro, 2023). CTS can be observed throughout the lifespan (Bertels et al., 2023), from newborns (Menn et al., 2022;Ortiz-Barajas, Guevara and Gervain, 2023) to older adults (Henry et al., 2017). Furthermore, atypical CTS has been associated with language impairments, as evidenced by studies on hearing loss (Decruy et al., 2020;Gillis et al., 2022;Kurthen et al., 2021), stroke-related or dementia-related aphasias (Dial et al., 2021;Kries et al., 2023;Quique et al., 2023), dyslexia (Lizarazu et al., 2015, 2021a, Lizarazu et al., 2021bMolinaro et al., 2016;Lallier et al., 2017Lallier et al., , 2018Rios-López et al., 2017;Schwarz et al., 2024) and specific language impairments (Kaganovich et al., 2014). ...
... In comparison with the clear and repetitive stimuli presented in experimental paradigms during free-flowing interactions, ostensive signals are generally rapid, unpredictable, and noncontiguous (Fogel & Garvey 2007) and potentially, therefore, difficult to track during real-world interactions (Yu & Smith 2017). These and other recent findings (Çetinçelik et al. 2022, Menn et al. 2022, Tan et al. 2022 are giving rise to a theoretical shift away from approaches that emphasize infants' inbuilt sensitivity to social signals, in particular eye gaze signals, during early attentional development and toward an approach that emphasizes the role of culturally situated social interactions and embodied explorations in driving the development of attention (Schroer & Yu 2022, Suarez-Rivera et al. 2019, Yu & Smith 2013. ...
Article
Full-text available
In this article we examine how contingency and synchrony during infant–caregiver interactions help children learn to pay attention to objects and how this, in turn, affects their ability to direct caregivers’ attention and to track communicative intentions in others. First, we present evidence that, early in life, child–caregiver interactions are asymmetric. Caregivers dynamically and contingently adapt to their child more than the other way around, providing higher-order semantic and contextual cues during attention episodes, which facilitate the development of specialized and integrated attentional brain networks in the infant brain. Then, we describe how social contingency also facilitates the child's development of predictive models and, through that, goal-directed behavior. Finally, we discuss how contingency and synchrony of brain and behavior can drive children's ability to direct their caregivers’ attention voluntarily and how this, in turn, paves the way for intentional communication.
... As discussed previously, the brain can entrain to prosodic cues very early in development (Martinez-Alvarez et al., 2023;Menn et al., 2022;Riva et al., 2018). Typically developing infants are also capable of synchronizing their spontaneous movements to external sounds. ...
... Moreover, ID speech has been found to facilitate cortical tracking of speech input in 7-month-old infants, which likely involves attentional mechanisms (Kalashnikova et al., 2018). Cortical tracking of nursery rhymes sung in ID speech is evident from 4 months of age (Attaheri et al., 2022), and is likely based on enhanced prosodic stress (Menn et al., 2022). ...
Article
This study investigates attention modulation as a function of infant directed (ID) versus adult directed (AD) speech in seven‐month‐old infants using electroencephalographic measures. In three experiments, infants were presented with either ID speech or AD speech as stimuli, followed by highly variable images of inanimate objects as targets. In Experiment 1 ( N = 18), images were preceded by ID or AD speech with semantic content (“Look here”). Contrary to hypothesis, targets preceded by AD speech elicited increased amplitude of the Negative central (Nc) component compared to targets preceded by ID speech, indicating increased attention. Experiment 2 ( N = 23) explored whether ID versus AD speech influences attention allocation also without semantic content. The same targets were either preceded by human voice sounds without semantic content (“Uh‐Ah”) following the prosody of either ID or AD speech register. No differences in attention allocation or object processing were observed. Experiment 3 ( N = 18) contrasted ID speech with and without semantic content and found enhanced attention allocation following stimuli without semantic content, but increased object processing following stimuli with semantic content. Overall, the effects observed here are consistent with the idea that less familiar speech stimuli increase attention for subsequent objects. Semantic content of stimuli increased the depth of object processing in 7‐month‐olds.
Preprint
Full-text available
Intonation units (IUs) are a universal building-block of human speech. They are found cross-linguistically and are tied to important language functions such as the pacing of information in discourse and swift turn-taking. We study the rate of IUs in 48 languages from every continent and from 27 distinct language families. Using a novel analytic method to annotate natural speech recordings, we identify a low-frequency rate of IUs across the sample, with a peak at 0.6 Hz, and little variation between sexes or across the life span. We find that IU rate is only weakly related to speech rate quantified at the syllable level (SR), and crucially, that cross-linguistic variation in IU rate is smaller than cross-linguistic variation in SR. Since SR was shown to be inversely related to information density quantified at the syllable level, we suggest that across languages, IUs are more balanced than syllables in their information content.
Article
While infants' sensitivity to visual speech cues and the benefit of these cues have been well‐established by behavioural studies, there is little evidence on the effect of visual speech cues on infants' neural processing of continuous auditory speech. In this study, we investigated whether visual speech cues, such as the movements of the lips, jaw, and larynx, facilitate infants' neural speech tracking. Ten‐month‐old Dutch‐learning infants watched videos of a speaker reciting passages in infant‐directed speech while electroencephalography (EEG) was recorded. In the videos, either the full face of the speaker was displayed or the speaker's mouth and jaw were masked with a block, obstructing the visual speech cues. To assess neural tracking, speech‐brain coherence (SBC) was calculated, focusing particularly on the stress and syllabic rates (1–1.75 and 2.5–3.5 Hz respectively in our stimuli). First, overall, SBC was compared to surrogate data, and then, differences in SBC in the two conditions were tested at the frequencies of interest. Our results indicated that infants show significant tracking at both stress and syllabic rates. However, no differences were identified between the two conditions, meaning that infants' neural tracking was not modulated further by the presence of visual speech cues. Furthermore, we demonstrated that infants' neural tracking of low‐frequency information is related to their subsequent vocabulary development at 18 months. Overall, this study provides evidence that infants' neural tracking of speech is not necessarily impaired when visual speech cues are not fully visible and that neural tracking may be a potential mechanism in successful language acquisition.
Chapter
Full-text available
In recent years there has been a shift within developmental psychology away from examining the cognitive systems at different ages, to trying to understand exactly what are the mechanisms that generate change. What kind of learning mechanisms and representational changes drive cognitive development? How can the imaging techniques available help us to understand these mechanisms? This new volume in the highy cited and critically acclaimed Attention and Performance series is the first to provide a systematic investigation into the processes of change in mental development. It brings together world class scientists to address brain and cognitive development at several different levels, including phylogeny, genetics, neurophysiology, brain imaging, behavior, and computational modeling, across both typically and atypically developing populations. Presenting original new research from the frontiers of cognitive neuroscience, this book will have a substantial impact in this field, as well as on developmental psychology and developmental neuroscience.
Article
Full-text available
Humans are born into a social environment and from early on possess a range of abilities to detect and respond to social cues. In the past decade, there has been a rapidly increasing interest in investigating the neural responses underlying such early social processes under naturalistic conditions. However, the investigation of neural responses to continuous dynamic input poses the challenge of how to link neural responses back to continuous sensory input. In the present tutorial, we provide a step-by-step introduction to one approach to tackle this issue, namely the use of linear models to investigate neural tracking responses in electroencephalographic (EEG) data. While neural tracking has gained increasing popularity in adult cognitive neuroscience over the past decade, its application to infant EEG is still rare and comes with its own challenges. After introducing the concept of neural tracking, we discuss and compare the use of forward vs. backward models and individual vs. generic models using an example data set of infant EEG data. Each section comprises a theoretical introduction as well as a concrete example using MATLAB code. We argue that neural tracking provides a promising way to investigate early (social) processing in an ecologically valid setting.
Article
Full-text available
Neuronal oscillations putatively track speech in order to optimize sensory processing. However, it is unclear how isochronous brain oscillations can track pseudo-rhythmic speech input. Here we propose that oscillations can track pseudo-rhythmic speech when considering that speech time is dependent on content-based predictions flowing from internal language models. We show that temporal dynamics of speech are dependent on the predictability of words in a sentence. A computational model including oscillations, feedback, and inhibition is able to track pseudo-rhythmic speech input. As the model processes, it generates temporal phase codes, which are a candidate mechanism for carrying information forward in time. The model is optimally sensitive to the natural temporal speech dynamics and can explain empirical data on temporal speech illusions. Our results suggest that speech tracking does not have to rely only on the acoustics but could also exploit ongoing interactions between oscillations and constraints flowing from internal language models.
Article
Full-text available
To acquire language, infants must learn to segment words from running speech. A significant body of experimental research shows that infants use multiple cues to do so; however, little research has comprehensively examined the distribution of such cues in naturalistic speech. We conducted a comprehensive corpus analysis of German child-directed speech (CDS) using data from the Child Language Data Exchange System (CHILDES) database, investigating the availability of word stress, transitional probabilities (TPs), and lexical and sublexical frequencies as potential cues for word segmentation. Seven hours of data (~15,000 words) were coded, representing around an average day of speech to infants. The analysis revealed that for 97% of words, primary stress was carried by the initial syllable, implicating stress as a reliable cue to word onset in German CDS. Word identity was also marked by TPs between syllables, which were higher within than between words, and higher for backwards than forwards transitions. Words followed a Zipfian-like frequency distribution, and over two-thirds of words (78%) were monosyllabic. Of the 50 most frequent words, 82% were function words, which accounted for 47% of word tokens in the entire corpus. Finally, 15% of all utterances comprised single words. These results give rich novel insights into the availability of segmentation cues in German CDS, and support the possibility that infants draw on multiple converging cues to segment their input. The data, which we make openly available to the research community, will help guide future experimental investigations on this topic.
Article
Full-text available
When humans listen to speech, their neural activity tracks the slow amplitude fluctuations of the speech signal over time, known as the speech envelope. Studies suggest that the quality of this tracking is related to the quality of speech comprehension. However, a critical unanswered question is how envelope tracking arises and what role it plays in language development. Relatedly, its causal role in comprehension remains unclear, as some studies have found it to be present even for unintelligible speech. Using electroencephalography, we investigated whether the neural activity of newborns and 6-month-olds is able to track the speech envelope of familiar and unfamiliar languages in order to explore the developmental origins and functional role of envelope tracking. Our results show that amplitude and phase tracking takes place at birth for familiar and unfamiliar languages alike, i.e. independently of prenatal experience. However, by 6 months language familiarity modulates the ability to track the amplitude of the speech envelope, while phase tracking continues to be universal. Our findings support the hypothesis that amplitude and phase tracking could represent two different neural mechanisms of oscillatory synchronisation and may thus play different roles in speech perception.
Article
Full-text available
Eye gaze is a ubiquitous cue in child–caregiver interactions, and infants are highly attentive to eye gaze from very early on. However, the question of why infants show gaze-sensitive behavior, and what role this sensitivity to gaze plays in their language development, is not yet well-understood. To gain a better understanding of the role of eye gaze in infants' language learning, we conducted a broad systematic review of the developmental literature for all studies that investigate the role of eye gaze in infants' language development. Across 77 peer-reviewed articles containing data from typically developing human infants (0–24 months) in the domain of language development, we identified two broad themes. The first tracked the effect of eye gaze on four developmental domains: (1) vocabulary development, (2) word–object mapping, (3) object processing, and (4) speech processing. Overall, there is considerable evidence that infants learn more about objects and are more likely to form word–object mappings in the presence of eye gaze cues, both of which are necessary for learning words. In addition, there is good evidence for longitudinal relationships between infants' gaze following abilities and later receptive and expressive vocabulary. However, many domains (e.g., speech processing) are understudied; further work is needed to decide whether gaze effects are specific to tasks, such as word–object mapping or whether they reflect a general learning enhancement mechanism. The second theme explored the reasons why eye gaze might be facilitative for learning, addressing the question of whether eye gaze is treated by infants as a specialized socio-cognitive cue. We concluded that the balance of evidence supports the idea that eye gaze facilitates infants' learning by enhancing their arousal, memory, and attentional capacities to a greater extent than other low-level attentional cues. However, as yet, there are too few studies that directly compare the effect of eye gaze cues and non-social, attentional cues for strong conclusions to be drawn. We also suggest that there might be a developmental effect, with eye gaze, over the course of the first 2 years of life, developing into a truly ostensive cue that enhances language learning across the board.
Article
The amplitude envelope of speech carries crucial low-frequency acoustic information that assists linguistic decoding at multiple time scales. Neurophysiological signals are known to track the amplitude envelope of adult-directed speech (ADS), particularly in the theta-band. Acoustic analysis of infant-directed speech (IDS) has revealed significantly greater modulation energy than ADS in an amplitude-modulation (AM) band centered on ∼2 Hz. Accordingly, cortical tracking of IDS by delta-band neural signals may be key to language acquisition. Speech also contains acoustic information within its higher-frequency bands (beta, gamma). Adult EEG and MEG studies reveal an oscillatory hierarchy, whereby low-frequency (delta, theta) neural phase dynamics temporally organize the amplitude of high-frequency signals (phase amplitude coupling, PAC). Whilst consensus is growing around the role of PAC in the matured adult brain, its role in the development of speech processing is unexplored. Here, we examined the presence and maturation of low-frequency (<12 Hz) cortical speech tracking in infants by recording EEG longitudinally from 60 participants when aged 4-, 7- and 11- months as they listened to nursery rhymes. After establishing stimulus-related neural signals in delta and theta, cortical tracking at each age was assessed in the delta, theta and alpha [control] bands using a multivariate temporal response function (mTRF) method. Delta-beta, delta-gamma, theta-beta and theta-gamma phase-amplitude coupling (PAC) was also assessed. Significant delta and theta but not alpha tracking was found. Significant PAC was present at all ages, with both delta and theta -driven coupling observed.
Article
Speech is a complex sound sequence that has rich acoustic and linguistic structures. Recent studies have suggested that low-frequency cortical activity can track linguistic units in speech, such as words and phrases, on top of low-level acoustic features. Here, with an artificial word learning paradigm, we investigate how different aspects of linguistic information, e.g., phonological, semantic, and orthographic information, modulate cortical tracking of words. Participants are randomly assigned to the experimental group or the control group. Both groups listen to speech streams composed of trisyllabic artificial words or trisyllabic real words. Participants in the experimental group explicitly learn different types of linguistic information of artificial words (phonological, phonological + semantic, or phonological + orthographic information), while participants in the control group do not explicitly learn the words. Electroencephalographic (EEG) data from the control group reveal weaker cortical tracking of artificial words than real words. However, when comparing the experimental and control groups, we find that explicit learning significantly improves neural tracking of artificial words. After explicit learning, cortical tracking of artificial words is comparable to real words, regardless of the training conditions. These results suggest training facilitates neural tracking of words and emphasize the basic role phonological information played in sequential grouping.
Article
The discovery of words in continuous speech is one of the first challenges faced by infants during language acquisition. This process is partially facilitated by statistical learning, the ability to discover and encode relevant patterns in the environment. Here, we used an electroencephalogram (EEG) index of neural entrainment to track 6-month-olds’ ( N = 25) segmentation of words from continuous speech. Infants’ neural entrainment to embedded words increased logarithmically over the learning period, consistent with a perceptual shift from isolated syllables to wordlike units. Moreover, infants’ neural entrainment during learning predicted postlearning behavioral measures of word discrimination ( n = 18). Finally, the logarithmic increase in entrainment to words was comparable in infants and adults, suggesting that infants and adults follow similar learning trajectories when tracking probability information among speech sounds. Statistical-learning effects in infants and adults may reflect overlapping neural mechanisms, which emerge early in life and are maintained throughout the life span.