Content uploaded by Katharina Menn
Author content
All content in this area was uploaded by Katharina Menn on Feb 28, 2022
Content may be subject to copyright.
NeuroImage 251 (2022) 118991
Contents lists available at ScienceDirect
NeuroImage
journal homepage: www.elsevier.com/locate/neuroimage
Natural infant-directed speech facilitates neural tracking of prosody
Katharina H. Menn
a , b , c , 1 , ∗
, Christine Michel
d , e , 1
, Lars Meyer
b , f
, Stefanie Hoehl
g , 1
,
Claudia Männel
a , h , 1
a
Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstr. 1a, Leipzig 04103, Germany
b
Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstr. 1a, Leipzig 04103, Germany
c
International Max Planck Research School on Neuroscience of Communication: Function, Structure, and Plasticity, Stephanstr 1a, Leipzig 04103, Germany
d
Research Group Early Social Cognition, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstr. 1a, Leipzig 04103, Germany
e
Faculty for Education, Leipzig University, Marschnerstraße 31, Leipzig 04109, Germany
f
Clinic for Phoniatrics and Pedaudiology, University Hospital Münster, Albert-Schweitzer-Campus 1, Münster 48149, Germany
g
University of Vienna, Faculty of Psychology, Universitätsring 1, Vienna 1010, Austria
h
Department of Audiology and Phoniatrics, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin 13353, Germany
Keywords:
EEG
Speech-brain coherence
Speech entrainment
Infant-directed speech
Natural interaction
Adult-directed speech
Infants prefer to be addressed with infant-directed speech (IDS). IDS benets language acquisition through am-
plied low-frequency amplitude modulations. It has been reported that this amplication increases electrophysi-
ological tracking of IDS compared to adult-directed speech (ADS). It is still unknown which particular frequency
band triggers this eect. Here, we compare tracking at the rates of syllables and prosodic stress, which are both
critical to word segmentation and recognition. In mother-infant dyads (n = 30), mothers described novel objects
to their 9-month-olds while infants’ EEG was recorded. For IDS, mothers were instructed to speak to their chil-
dren as they typically do, while for ADS, mothers described the objects as if speaking with an adult. Phonetic
analyses conrmed that pitch features were more prototypically infant-directed in the IDS-condition compared
to the ADS-condition. Neural tracking of speech was assessed by speech-brain coherence, which measures the
synchronization between speech envelope and EEG. Results revealed signicant speech-brain coherence at both
syllabic and prosodic stress rates, indicating that infants track speech in IDS and ADS at both rates. We found sig-
nicantly higher speech-brain coherence for IDS compared to ADS in the prosodic stress rate but not the syllabic
rate. This indicates that the IDS benet arises primarily from enhanced prosodic stress. Thus, neural tracking is
sensitive to parents’ speech adaptations during natural interactions, possibly facilitating higher-level inferential
processes such as word segmentation from continuous speech.
1. Introduction
Across many languages, adults address infants in a characteristic reg-
ister termed infant-directed speech (IDS) ( Cristia, 2013; Fernald et al.,
1989; Soderstrom, 2007 ). IDS diers from adult-directed speech (ADS)
along acoustic and linguistic dimensions. In particular, IDS contains
exaggerated prosodic cues ( Fernald and Simon, 1984; Fernald et al.,
1989; Grieser and Kuhl, 1988; Katz et al., 1996 ), is syntactically simpler
( Soderstrom et al., 2008 ) and may be spoken more slowly ( Raneri et al.,
2020 ) with expanded vowel sounds ( Adriaans and Swingley, 2017;
Green et al., 2010 ). Previous electrophysiological (EEG) work has in-
dicated that these IDS characteristics benet infants’ speech process-
ing (e.g. Háden et al., 2020; Zangl and Mills, 2007 ). While earlier
EEG studies mostly focused on event-related potentials, we here em-
∗ Corresponding author.
E-mail address: menn@cbs.mpg.de (K.H. Menn).
1 These authors each contributed equally and should be regarded as shared-rst and shared-senior authors, respectively.
ploy EEG to examine infants’ online speech processing continuously.
There are indications that IDS benets infants’ language acquisition
in particular. Frequent exposure to IDS boosts later vocabulary devel-
opment ( Ramírez-Esparza et al., 2014; Weisleder and Fernald, 2013 )
and laboratory studies showed that IDS assists infants’ word segmenta-
tion ( Schreiner and Mani, 2017; Thiessen et al., 2005 ) and recognition
( Männel and Friederici, 2013; Singh et al., 2009 ), and their acquisition
of word-object associations ( Graf Estes and Hurley, 2013 ) over ADS.
Which specic acoustic cues in IDS help infants’ language acqui-
sition? Candidates include increased fundamental frequency (F0) and
F0 modulation (see Spinelli et al., 2017 for a meta-analysis). In re-
cent years, a particular focus has been put on the amplitude modula-
tion structure in IDS. Continuous speech contains acoustic information
at dierent timescales, which to a certain extend correspond to lin-
guistic units, such as phonemes, syllables, and intonation phrases. In
particular, the amplitude envelope conveys the boundaries of linguis-
tic units even to infant listeners who lack vocabulary as such (see also
https://doi.org/10.1016/j.neuroimage.2022.118991 .
Received 23 September 2021; Received in revised form 2 February 2022; Accepted 10 February 2022
Available online 12 February 2022.
1053-8119/© 2022 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Goswami, 2019 ). Leong and Goswami (2015) analyzed the amplitude
modulation structure of nursery rhymes, a particularly rhythmic form
of IDS, which were read by female speakers prompted with a picture
depicting young children. The authors found that amplitude modula-
tions are centered around three frequency rates, which match the occur-
rence rates of: prosodic stress (~2 Hz), syllables (~5 Hz), and phonemes
(~20 Hz). When comparing spontaneously produced IDS during mother-
infant interactions to ADS that the mother produced when interacting
with another adult, Leong et al. (2017) found that amplitude modula-
tions of prosodic stress are enhanced for IDS compared to ADS. This
exaggeration of prosodic stress in IDS may be benecial for infants’ lan-
guage development, as stress can provide an important cue for word
onsets in naturalistic speech ( Cutler and Carter, 1987; Jusczyk et al.,
1999; Stärk et al., 2021 ) and thus aid word segmentation. If infants are
sensitive to the pronounced stress modulations in IDS, these could thus
provide an important stepping stone into language acquisition.
Recent studies have shown that infants’ neural activity tracks
speech by synchronizing with amplitude modulations corresponding to
prosodic stress and syllables in nursery rhymes ( Attaheri et al., 2022 ).
For adults, it has been shown that the synchronization between neural
activity and speech acoustics supports the segmentation and identica-
tion of linguistic units in speech (see Meyer, 2018 ) and relates to bet-
ter language comprehension ( Doelling et al., 2014; Peelle et al., 2013 ).
Importantly, infants were shown to start tracking simple repeated sen-
tences from birth ( Ortiz Barajas et al., 2021 ). This early emergence sug-
gests that neural tracking may support language development by align-
ing neural activity with speech-relevant amplitude modulations. At least
by 7-months of age, infants’ tracking is sensitive to the kind of speech
register (IDS vs. ADS) and IDS benets tracking of speech over ADS
( Kalashnikova et al., 2018 ). It remains unclear, however, whether this
benet results specically from prosodic stress or other speech charac-
teristics, such as the syllable rhythm.
We here assess infants’ tracking of speech in a naturalistic mother-
infant interaction. The use of naturalistic IDS has the benet of high
ecological validity, as it elucidates infants’ neural processing of the
speech input they typically receive and thus increases generalizabil-
ity of ndings. Naturalistic stimuli allow for the dissociation of mul-
tiple levels of information in parallel (see also Jessen et al., 2021 ).
For this reason, the number of studies relying on naturalistic in-
put for investigating infants’ neural processing of speech has recently
started to increase and stimuli included recordings taken from nat-
ural mother-infant interactions ( Kalashnikova et al., 2018 ), TV car-
toons ( Jessen et al., 2019 ) and one study even directly assessed face-
to-face interactions ( Lloyd-Fox et al., 2015 ). In face-to-face interac-
tions, the speaker’s visual cues are contingent with infant responses,
which is dicult to manipulate in classical experiments. For the cur-
rent study, the most relevant of these contingent cues is eye con-
tact between parents and infants (mutual gaze), which was shown
to increase neural processing of speech if combined with IDS ( Lloyd-
Fox et al., 2015 ). However, given the diculty of manipulating mutual
gaze experimentally, the specic eects on infants’ speech processing
are currently not well understood (for a review, see Çetinçelik et al.,
2020 ).
In the current study we focus on the association between parental
acoustic speech adaptations and infants’ tracking, aiming at delineating
whether neural tracking is facilitated by prosodic stress (dened by pitch
contours) or syllable information (dened by the mean syllable dura-
tion) in IDS. To this end, we here contrast 9-month-old infants’ responses
to their mothers’ IDS versus ADS at the stress rate and the syllabic rate.
Focusing on 9-month-olds is particularly interesting, as infants at this
age have started segmenting words from continuous speech but still
mostly rely on prosodic cues ( Männel and Friederici, 2013; Schreiner
and Mani, 2017 ), meaning that information in the prosodic stress rate
is particularly relevant for their word segmentation ( Kooijman et al.,
2009 ). In mother-infant dyads, mothers described novel objects to their
9-month-olds while the infants’ EEGwas recorded. For IDS, parents were
instructed to speak to their infants as they typically do, while for ADS,
parents were supposed to describe the objects pretending they talk to
an adult without looking at the infant or calling their name. Infants’
tracking of maternal speech during the interactions was assessed using
speech-brain coherence, which measures the synchronization between
the neural signal and the speech envelope. We hypothesized that infants
show speech-brain coherence at both the stress rate and the syllable rate.
Concerning the dierence between IDS and ADS processing, we postu-
late that IDS facilitates tracking ( Kalashnikova et al., 2018 ) and that this
facilitation is driven by enhanced amplitude modulations of prosodic
stress ( Leong et al., 2017 ).
2. Method
The present study reanalyzed data from a previous experiment,
which assessed the inuence of ostensive cues on infants’ visual object
encoding ( Michel et al., 2021 ). Parents were asked to show and describe
a total of 12 novel objects to their infant during a familiarization phase.
Half of the objects were described naturally (IDS-condition), the other
half were described without ostensive cues (i.e., mutual gaze, calling the
infant by their name, and infant-directed speech; ADS-condition). Im-
portantly, parents were asked to refrain from naming the objects. Given
the aim of the present study to examine infants’ neural processing of
natural parental speech, we here assessed infants’ tracking of maternal
speech during the mother-infant interactions. Only the object descrip-
tion phase was analyzed for the purpose of the current study and will
be described in this manuscript.
2.1. Participants
The nal participant sample consisted of 30 German-learning infants
(22 female) and their mothers. On average, infants were 9 months 12
days old (range: 9 months 0 days - 9 months, 29 days). Infants were
born full-term ( > 37 weeks), healthy, and raised in monolingual Ger-
man environments. Our sample size was determined by the previously
collected dataset. Michel et al. (2021) based their sample size on stud-
ies investigating infants’ object encoding using similar paradigms and
measures (e.g. Begus et al., 2015; Hoehl et al., 2014 ).
Additional 51 mother-infant (16 female, 𝑀
𝑎𝑔𝑒
= 9 months 15 days)
interactions were tested, but not included in the current analysis due
to less than 30 s total maternal speech in one of the conditions ( 𝑛 =
17), more than 4 noisy electrodes ( 𝑛 = 1), failure to reach the mini-
mum criterion of 20 EEG epochs per condition after artifact rejection
( 𝑛 = 19), premature birth ( 𝑛 = 1), technical error ( 𝑛 = 6), or infant
fussiness ( 𝑛 = 7). Because of the dierent foci of this manuscript and
the original study ( Michel et al., 2021 ), the exclusion criteria diered
between the manuscripts and only 19 infants were commonly included
in both. Informed written consent was obtained from the mothers be-
fore the experiment and ethical approval for the experimental procedure
and reanalysis of the data was obtained from the Medical Faculty of the
University of Leipzig. All work was conducted in accordance with the
Declaration of Helsinki. The conditions of our ethics approval do not
permit public archiving of participant data. Readers seeking access to
the data should contact the corresponding author to arrange a formal
data sharing agreement.
2.2. Procedure
Mothers and infants were seated across a small table. Infants sat in
a baby chair while their electrophysiological activity was continuously
recorded using EEG. Mother-infant interactions were recorded on video
using four cameras and maternal speech was recorded using a micro-
phone that was placed on the table in front of the mother (see Fig. 1 A).
The study consisted of 4 blocks, during each of which the mother
held three novel objects above the table and spoke about them to her
2
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Fig. 1. Overview of the experiment and analysis. (A) Example of the setting during the mother-infant interactions. Mother and infant sat across each other at a table.
The mother held a novel object and described it to her infant either using IDS or using ADS, while the infant’s EEG was recorded. (B) Overview of the speech-brain
coherence analysis. Cleaned EEG and speech envelope were band-pass ltered in two frequency bands: prosodic stress rate and syllable rate. Coherence between EEG
and envelope was computed for each electrode in both frequency bands.
infant. The blocks alternated between the IDS-condition and the ADS-
condition. The only dierence between the two conditions was the way
in which the mother was asked to describe the objects. Mothers were
told that the aim of the study was to investigate the dierence between
joint observation and individual processing of objects on infants’ vi-
sual object encoding, as this was the goal of the original study. They
were specically told to focus on eye gaze and speech. In the IDS-
condition, the mother was asked to speak to her infant as she nor-
mally would when interacting with a novel object. She was speci-
cally told that she could use IDS, call the infant’s name and look at
the infant. In the ADS-condition, the mother was instructed to describe
the object as if she were speaking to an adult, that is she was asked
to imagine that she was talking to herself or describing the objects to
a close friend. She was also asked to refrain from calling the infant’s
name and looking at the infant, and specically from establishing eye
gaze during the ADS-condition. In both conditions, the infant was not
allowed to touch the objects. The condition of the rst block was coun-
terbalanced between dyads. Mothers were given standardized oral and
written instructions and were reminded of the procedure before every
block.
Each block started with a 20 s baseline, during which infant and
mother looked at soap bubbles produced by an experimenter. After-
wards, the object description phase started either after mutual gaze be-
tween infant and parent had been established (IDS-condition) or after
the child looked at the mother (ADS-condition). In both conditions, the
trial ended after the infant looked at the object for a cumulative total of
20 s. Looking duration was coded online by an experimenter observing
the interactions on a screen. A second experimenter then announced the
end of a trial by thanking the mother and switched the object. Average
trial duration was 39.2 s ( SD = 8.6; see Supplementary Fig. 1 for an
overview of the whole procedure). Mothers were unaware of the look-
ing time criterion. None of the objects had eyes or face-like features on
it. Pretests with an independent sample of infants conrmed that, in
general, infants were unfamiliar with the objects and all objects were
similarly interesting to infants.
2.3. Speech processing
2.3.1. Preprocessing
Audio recordings were annotated and analyzed using Praat
( Boersma, 2001 ). We annotated every instance of maternal speech dur-
ing the object description phase, excluding fragments with any non-
speech interference. Instances of such interference included: infant vo-
calizations, laughter, external noise, or (rhythmic) non-speech sounds,
such as knocking the object on the table, scratching the surface of the
object or tapping against the object. Speech segments with pauses longer
than 1000 ms were coded as separate segments.
2.3.2. Amplitude envelope
The broad-band amplitude envelope of the audio signals was
computed following Gross et al. (2013) using the Chimera toolbox
( Smith et al., 2002 ). The intensity of the speech signal was normalized
per condition. We divided the frequency spectrum from 100 - 8000 Hz
into nine frequency bands equally spaced on the cochlea. The audio
signal was band-pass ltered into these frequency bands with a fourth
order Butterworth lter (forward and backward). Afterwards, the abso-
lute values of the Hilbert transform were computed for each band and
averaged across bands. Last, the envelope was downsampled to 500 Hz,
which corresponds to the sampling rate of the EEG signal.
In addition, we computed the pitch envelope for both conditions sep-
arately. For this we determined the respective F0 range for both speech
conditions (IDS: 145 - 392 Hz; ADS: 138 - 325 Hz), which we divided
into three frequency bands equally spaced on the cochlea. We then fol-
lowed the same procedure as described for the broad-band envelope.
2.3.3. Frequency bands
To identify the syllable rate of mothers’ IDS and ADS, we annotated
the duration of all syllables for the dyads included in the nal analy-
sis. The average syllable duration was 194 ms for the ADS-condition
and 181 ms for the IDS-condition. The syllable rate was determined as
the 2 Hz window centered around the average syllable duration (ADS:
194 ms or 5.15 Hz; IDS: 181 ms or 5.5 Hz), leading to 4.15 Hz - 6.15 Hz
for ADS and 4.5 - 6.5 Hz for IDS.
The prosodic stress rate of mothers’ speech was identied based
on the pitch envelope. For this, we segmented the parts of the pitch
envelope corresponding to uninterrupted maternal speech into epochs
of 2 s length with 50% overlap. We then computed the Fourier trans-
form of each epoch using Slepian multitapers and averaged the resulting
power spectral density (PSD) estimate across epochs and dyads for both
speech conditions. The averaged PSD was visually inspected for devi-
ations from the aperiodic 1/f noise. This way the frequency band for
the prosodic stress rate was determined as 1 - 2.5 Hz. We decided not
to assess amplitudes below 1 Hz since this is the high-pass frequency
recommended for the preprocessing of developmental EEG data (see
e.g. Gabard-Durnam et al., 2018 ). The bands identied for the prosodic
stress rate and the syllable rate were in line with rates reported in pre-
vious studies (e.g. Chandrasekaran et al., 2009; Leong and Goswami,
2015 ).
2.3.4. Amplitude modulations
To compute the amplitude modulations at the syllable rate, we l-
tered the broad-band amplitude envelope in the corresponding fre-
3
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
quency bands for IDS and ADS. We then segmented the parts of the
envelope corresponding to uninterrupted maternal speech into epochs
of 2 s length with 50% overlap. Root mean square values were computed
for every epoch and averaged across epochs for both speech conditions.
Amplitude modulations in the prosodic stress rate were computed
based on the pitch envelope. We band-pass ltered the pitch envelope in
the frequency band corresponding to prosodic stress before proceeding
in the same way as described for the syllable rate.
2.4. Experimental manipulation check
To assess whether the speech in the IDS-condition was more typ-
ically infant-directed than speech in the ADS-condition, we measured
the mean F0 and F0 range (between the 5th and the 95th percentile) of
maternal speech in both conditions as an acoustic correlate of IDS (see,
Spinelli et al., 2017 ). In addition, we tested whether the amplitude mod-
ulations in the prosodic stress rate and the syllable rate diered between
IDS versus ADS. We ran separate t-tests for each acoustic measure, as-
sessing a dierence between the IDS- and the ADS-condition. Note that
we opted for separate tests in assessing condition dierences in ampli-
tude modulations in the two frequency bands since they were computed
based on dierent envelopes and are therefore not directly comparable.
Resulting p-values were corrected for multiple comparisons using false
discovery rate (FDR-correction).
2.5. EEG-Recording and preprocessing
EEG was recorded with a 32-channel EasyCap system by Brain Prod-
ucts GmbH, with active electrodes arranged according to the 10/10 sys-
tem. The sampling rate of the recordings was 500 Hz. The right mas-
toid served as the online reference and vertical electrooculograms were
recorded bipolarly if tolerated by the infant.
EEG processing was done using the publicly available ’eeglab’
( Delorme and Makeig, 2004 ) and ’eldtrip’ ( Oostenveld et al., 2011 )
toolboxes as well as custom Matlab code (The MathWorks, Inc., Natick,
US). EEG preprocessing was done automatically using a modied ver-
sion of the Harvard Automated Preprocessing Pipeline (HAPPE: Gabard-
Durnam et al., 2018 ). In line with HAPPE, data was re-referenced to Cz
to obtain symmetrical components in the ICA, high-pass ltered with a
noncausal nite impulse response lter (pass-band: 1 Hz, -6 dB cuto:
0.5 Hz) and electrical line noise (50 Hz) was removed using ZapLine
from NoiseTools ( de Cheveigné, 2020 ). Noisy channels were identied
by assessing the normed joint probability of the average log power from
1 - 125 Hz and rejected if exceeding a threshold of 3 SD from the mean
(mean number of removed channels = 1; range: 0–4). We applied a
wavelet-enhanced ICA ( Castellanos and Makarov, 2006 ) with a thresh-
old of 3 to remove large artifacts, before the data was decomposed with
ICA and artifact-related components were automatically rejected using
MARA ( Winkler et al., 2011 ; mean number of rejected components =
14, range: 7–25). Afterwards, noisy channels were interpolated using
spherical splines and the data was re-referenced to the linked mastoids.
EEG data and the broad-band speech envelope were band-pass l-
tered at the stress and syllable rate. Filter order was optimised through
the Parks-McLellan algorithm ( Parks and McClellan, 1972 ). For the
prosodic stress band, this resulted in a 14572th-order one-pass 1–2.5-Hz
band-pass lter. The phase shift was compensated for by an according
time shift. For the syllabic band, we used an 15883th-order one-pass
lter with pass-frequencies of 4.5 - 6.5 Hz for IDS and 4.15 - 6.15 Hz for
ADS. All data were padded before lter application.
The artifact-corrected EEG data was segmented into continuous trials
corresponding to the annotated maternal speech and combined with the
respective broad-band speech envelope, which had been downsampled
to 500 Hz. The combined data was segmented into 2 second epochs with
50% overlap. Epochs with amplitudes exceeding ±40 𝜇V in any channel
were rejected automatically. On average, infants contributed a total of
112 epochs to the analysis ( 𝑀
𝐼𝐷𝑆
= 57.8, SD = 27.4; 𝑀
𝐴𝐷𝑆
= 54.2, SD
= 32.8). The 23 channels included in the nal analysis were: Fz, F3/4,
F7/8, FC1/2, FC3/4, FT7/8, Cz, C3/4, T7/8, CP3/4, Pz, P3/4, and P7/8.
We removed the outer channels from the nal analysis, since the EEG
signal was consistently noisy across infants.
2.6. Data analysis
2.6.1. Speech-brain coherence
The relationship between speech and brain signal was quantied
using Hilbert coherence over time (see Fig. 1 B). The coherence value
measures the phase-synchronization between the EEG signal and the
corresponding speech envelope, weighted by their relative amplitude.
Coherence is measured on a scale from 0 (random coupling) to 1 (per-
fect synchronization).
Coherence between speech envelope and individual electrodes
in both frequency rates was computed according to the formula:
𝐶𝑜ℎ
𝑥𝑦
( 𝑓) =
|𝑃
𝑥𝑦
( 𝑓)
2
|
𝑃
𝑥𝑥
( 𝑓 ) 𝑃
𝑦𝑦
( 𝑓 )
, where 𝑃
𝑥𝑦
( 𝑓) is the cross-spectral density be-
tween the band-pass ltered speech and EEG signal, and 𝑃
𝑥𝑥
( 𝑓) and
𝑃
𝑦𝑦
( 𝑓) are the auto-spectral density of the speech and EEG signal, re-
spectively.
To analyze whether speech-brain coherence was higher than ex-
pected by chance, the observed coherence values were compared against
surrogate data. Surrogate data was created by randomly pairing the
epoched EEG data with the broad-band speech envelope from a ran-
domly selected epoch from the same or a dierent dyad and applying
a circular shift to the envelope time series ( Keitel et al., 2017 ). This
process was repeated for 10,000 permutations.
2.6.2. Analyses
The observed and permuted coherence values for each infant were
averaged across trials and channels. P-values were derived as the pro-
portion of coherence values in the permutation distribution exceeding
the observed value. To assess dierences between IDS and ADS, we ran
a repeated-measures ANOVA with speech condition (IDS vs. ADS) and
frequency rate (syllabic rate vs. prosodic rate) as within-subjects factors.
3. Results
Maternal speech in the IDS-condition was more prototypically infant-
directed than in the ADS-condition. Speech had a signicantly higher
mean pitch, 𝑡 (29) = 7 . 2 , 𝑝 < . 001 , and pitch range, 𝑡 (29) = 6 . 21 , 𝑝 < . 001 ,
in the IDS-condition compared to the ADS-condition. The amplitude
modulations were signicantly higher for IDS than ADS in the stress rate,
𝑡 (29) = 4 . 1 , 𝑝 < . 001 , but not in the syllable rate, 𝑡 (29) = 0 . 71 , 𝑝 = . 482 .
Table 1 summarizes the descriptive statistics of the acoustic measures.
For further summary statistics of speech content, see supplementary
Table 1 .
The permutation test showed signicant speech-brain coherence for
both the prosodic stress rate, 𝑝 < . 001 , and the syllable rate, 𝑝 < . 001
( Fig. 2 A). The repeated-measures ANOVA showed a signicant main
eect of speech condition, 𝐹 (1 , 29) = 160 . 77 , 𝑝 < . 001 , and no signi-
cant main eect of frequency rate, 𝐹 (1 , 29) = 2 . 43 , 𝑝 = . 13 . Importantly,
we observed a signicant interaction between speech condition and
frequency rate, 𝐹 (1 , 29) = 9 . 14 , 𝑝 = . 005 ( Fig. 2 B). Follow-up t-tests re-
vealed that speech-brain coherence for the stress rate was signicantly
higher in the IDS-condition ( 𝑀
𝐼𝐷𝑆
= 0.492, SD = 0.025) than in the
ADS-condition ( 𝑀
𝐴𝐷𝑆
= 0.475, SD = 0.022), 𝑡 (29) = 3 . 4 , 𝑝 = . 002 . We
found no evidence for a dierence between the IDS-condition ( 𝑀
𝐼𝐷𝑆
=
0.419, SD = 0.02) and the ADS-condition ( 𝑀
𝐴𝐷𝑆
= 0.425, SD = 0.02)
for the syllable rate, 𝑡 (29) = −0 . 99 , 𝑝 = . 33 . Analyses were repeated on
non-normalized data to ensure that the dierence between conditions
did not arise from intensity dierences. The pattern of the results did
not change.
4
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Table 1
Analysis of speech acoustics. Standard deviation in brackets.
Acoustic Measure IDS ADS p-value
Pitch (F0) Mean 238 Hz (28) 214 Hz (19) < .001
Range 247 Hz (62) 188 Hz (49) < .001
Amplitude Modulations (a.u.; 1 ×10
−3
) Stress Rate 2.5 (0.50) 2.1 (0.46) < .001
Syllable Rate 1 (0.14) 0.96 (0.15) .482
Fig. 2. Overview of our results. (A) Coherence values were averaged across all electrodes. Errorbars depict standard errors. Dashed lines indicate 95% signicance
cut-os based on a permutation baseline. Speech-brain coherence was signicantly higher than chance for both IDS and ADS in the two frequency rates. (B) Scalp
topography for the comparison IDS versus ADS. Asteriscs indicate electrodes included in the cluster in the control analysis. For the main analysis, we compared
averages across all electrodes. The dierence between IDS and ADS was signicantly higher in the stress rate than in the syllable rate.
3.1. Control analysis: Ostensive cues
Ostensive cues potentially inuence speech processing (see
Çetinçelik et al., 2020; Csibra and Gergely, 2009 ). In our study,
such cues were primarily present in the IDS-condition. We therefore
conducted additional analyses to control for the possibility that the
tracking dierence between IDS and ADS observed in our study was
based on dierences in ostensive cues, specically focusing on mutual
eye gaze, infant looks to the mother’s face and mentioning the infant’s
name.
In every frame of the video recording, mother’s and infant’s gaze
were coded as looking to the object, to the face of the interaction partner,
to the environment or as non-codeable. The reliability of the codes was
excellent (ICC for mothers = 0.994, ICC for infants = 0.987). Mutual gaze
was dened as periods with simultaneous gaze on the other interaction
partner. We then reanalyzed the data excluding all epochs containing
mutual eye gaze. On average, infants contributed a total of 103 epochs
to the follow-up analysis ( 𝑀
𝐼𝐷𝑆
= 49.4, SD = 23.2; 𝑀
𝐴𝐷𝑆
= 54.1, SD =
32.7). A paired t -test comparing the speech conditions in the stress rate
showed that speech-brain coherence was still signicantly higher for
the IDS-condition ( 𝑀
𝐼𝐷𝑆
= 0.489, SD = 0.023) than the ADS-condition
( 𝑀
𝐴𝐷𝑆
= 0.475, SD = 0.022) after controlling for the eect of mutual
eye gaze, 𝑡 (29) = 2 . 87 , 𝑝 = . 008 . It is, however, possible that infants show
a sustained eect of mutual gaze beyond the epoch. We therefore also
excluded the 5 epochs succeeding mutual eye gaze. This also did not
change the pattern of our results. Note that we were unable to exclude
the whole object description trial in which mutual eye gaze occured, as
this would have left us with too few epochs for a reliable comparison.
In addition, we compared tracking of IDS in the prosodic stress rate
between infants with high mutual gaze to infants with low mutual gaze,
grouped by a median split of the number of epochs containing mutual
gaze. The two groups did not signicantly dier, t (28) = 0.467, p = .64.
To assess the possibility that the IDS advantage for tracking in the
prosodic stress rate was driven by maternal visual cues other than mu-
tual gaze, we excluded all epochs in which the infant looked at the
mother’s face, irrespective of whether there was mutual gaze or not.
On average, infants contributed a total of 90.9 remaining trials to this
follow-up analysis ( 𝑀
𝐼𝐷𝑆
= 45.1, SD = 23.3; 𝑀
𝐴𝐷𝑆
= 45.8, SD = 26.57).
Speech-brain coherence in the prosodic stress rate remained signi-
cantly higher for the IDS-condition ( 𝑀
𝐼𝐷𝑆
= 0.489, SD = 0.026) than
the ADS-condition ( 𝑀
𝐴𝐷𝑆
= 0.472, SD = 0.025) after excluding these
epochs in which infants were looking at their mother’s face, 𝑡 (29) = 3 . 07 ,
𝑝 = . 005 .
Lastly, we assessed whether the amount of calling the infant’s name
in the IDS-condition drove the IDS facilitation in the stress rate. On av-
erage, mothers called their infant’s name 3.9 times in the IDS-condition
( SD = 3.7). We compared tracking in the stress rate between infants who
experienced high calling of their name versus infants who experienced
low calling of their name, which were grouped based on a median split
(median = 3.5). There was no signicant dierence between the two
name-calling groups, t (28) = 0.7, p = .489. Note that we only controlled
for instances in which the infant’s full name or an abbreviation of it
was mentioned, but not for other potentially attention-evoking phrases
that mothers commonly use in IDS. We therefore cannot fully rule out
that the use of such phrases increased attention specically in the IDS
condition.
5
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
3.2. Control analysis: Topography
All EEG analyses reported before were done on coherence values
averaged across the 23 selected electrodes. This approach may hide to-
pography dierences between the IDS- and the ADS-condition in the
two frequencies of interest. To assess this possibility, we conducted
a control analysis on the electrode level, using threshold-free cluster-
enhancement with 10,000 permutations for multiple comparison correc-
tion (height-weight = 2, extend-weight = 0.5; Smith and Nichols, 2009 ).
In line with our earlier results, we found a signicant dierence between
the IDS- and the ADS-condition in the prosodic stress rate ( p < .001), but
not in the syllable rate. The dierence in the stress rate was driven by a
left-central cluster that included electrodes F3, FC3, FC1, C3, CP3, P3,
Cz, FC2, FC4, and CP4. These electrodes are marked by asterisks in the
topography plot in Fig. 2 B.
3.3. Control analysis: Pauses
IDS has been related to an increased number of pauses compared
to ADS ( Martin et al., 2016 ), which may form acoustic edges that can
contribute to speech-brain coherence ( Gross et al., 2013 ). In line with
earlier ndings, the IDS-condition (25 pauses/min, SD = 11.3) had a
higher rate of pauses than the ADS-condition (17.3 pauses/min, SD =
11.1), 𝑡 (29) = 3 . 82 , 𝑝 < . 001 . Pause durations did not dier between the
two conditions ( 𝑀
𝐼𝐷𝑆
= 259 ms, SD = 75; 𝑀
𝐴𝐷𝑆
= 250 ms, SD =
78), 𝑡 (29) = 0 . 63 , 𝑝 = . 536 . To assess whether the increased number of
pauses in IDS contributes to the IDS advantage for tracking, we com-
pared phase-clustering from 1 to 8 Hz (in steps of 0.5 Hz) at word onsets
following pauses and thus forming an acoustic edge to phase-clustering
at word onsets within continuous speech. The analysis assessed phase-
clustering starting -100 ms before word onset until 1 second after in
steps of 10 ms for all electrodes individually, and number of word onsets
contributing to the analysis were matched. Our analysis used cluster-
based permutation for multiple comparison correction and showed no
signicant dierence in phase-clustering between the two types of word
onsets ( p > .1). Next, we compared phase-clustering at pause oset be-
tween the IDS- and the ADS-condition using the same frequencies and
time window. The cluster-based permutation analysis showed no signif-
icant dierence in phase-clustering between the two conditions ( p > .1),
giving no evidence that infants’ neural responses to pauses diered be-
tween IDS and ADS. At last, we compared tracking in the stress rate
between infants with a higher rate of pauses versus infants with a lower
rate of pauses, grouped based on a median split (median = 25.8). The
two groups showed no signicant dierences in tracking, 𝑡 (29) = 0 . 69 ,
𝑝 = 0 . 5 . While this does not exclude the possibility that pauses and as-
sociated acoustic edges increase speech-brain coherence, we nd no ev-
idence that they are the main driver of the IDS facilitation for tracking
in the stress rate.
4. Discussion
The present study set out to investigate infants’ neural tracking of
natural IDS compared to ADS and to delineate whether the IDS facili-
tation is driven by prosodic stress. We observed signicant tracking of
speech at both the stress and the syllable rate during natural interac-
tions of 9-month-olds with their mothers. Adding to previous ndings,
we report here that tracking is facilitated by IDS and that this eect is
specic to the prosodic stress rate. This suggests that the IDS advantage
for infants’ tracking is specically based on enhanced prosodic stress
and not on the syllable rhythm. Our nding emphasizes the important
role of IDS for infants’ speech processing and possibly their language
development.
At the age of 9 months, infants have started to segment words from
continuous speech ( Junge et al., 2014; Jusczyk et al., 1999; Männel
and Friederici, 2013 ), facilitated by IDS ( Schreiner and Mani, 2017 ).
Speech segmentation is crucial for the acquisition of higher-level lin-
guistic meaning and better word segmentation in infancy was shown
to predict later vocabulary size ( Junge et al., 2012 ) and syntactic skills
( Kooijman et al., 2013 ). Since continuous speech contains no pauses be-
tween words, infants must rely on other cues to detect word boundaries.
In stress-based languages like English or German, stressed syllables can
provide a valuable cue for segmenting words from continuous speech
( Jusczyk et al., 1999 ), as the majority of content words in these lan-
guages have word-initial stress ( Cutler and Carter, 1987; Stärk et al.,
2021 ). Our study shows that that not only do mothers enhance their am-
plitude modulations at the prosodic stress rate in IDS, but also infants
do track this enhancement. This suggests that tracking might facilitate
higher-level inferential processes such as word segmentation.
Because of the way this study was set-up, the IDS-condition included
a number of additional ostensive cues that were not present in the
ADS-condition. Most relevant are the addition of mutual gaze between
mother and infant and calling of the infant’s name, as mothers were
specically told to focus on these cues. In addition, it is possible that
mothers increased other visual cues in the IDS-condition, as adults were
shown to exaggerate facial expressions such as lip and head movements
when addressing children ( Green et al., 2010; Smith and Strader, 2014;
Swerts and Krahmer, 2010 ), which we were unable to assess in the cur-
rent study. These ostensive cues are special as they help guiding infants’
attention to maternal speech ( Csibra and Gergely, 2006; 2009 ) and con-
sequently may have assisted to increase infants’ speech processing (for
a review, see Çetinçelik et al., 2020 ). However, we nd that the IDS-
condition specically facilitated tracking in the prosodic stress rate and
no evidence for an IDS facilitation in the syllable rate. This nding is
not compatible with a general increase of attention to maternal speech
by ostensive cues in the IDS-condition. In addition, our control analy-
sis showed that the IDS benet for tracking persists even after we ex-
cluded epochs with mutual eye gaze and that infants who experienced
more calling of their name did not show a higher tracking of IDS in the
prosodic stress rate than infants who experienced less calling of their
name. These results do not imply that visual information is irrelevant
for speech processing. Previous studies have shown that visual informa-
tion increases tracking of speech in adults ( Bourguignon et al., 2020;
Crosse et al., 2015 ) and likely also in children ( Power et al., 2012 ). As
our design does not allow to investigate whether the frequency of visual
exaggerations in the IDS-condition coincides with the prosodic stress
rate, we conducted a control analysis excluding all epochs during which
the infant looked at the mother. Even for the parts of the interactions in
which the infants did not look at the mother, the IDS tracking advantage
in the prosodic stress rate persisted. This supports our conclusion that
the IDS benet for speech processing results from its acoustic properties,
even though we cannot fully exclude the possibility that infants still per-
ceived some exaggerated visual cues even if they did not directly look
at the mother’s face. Further studies are needed to dissociate the unique
contributions of visual and acoustic cues to infants’ neural processing of
IDS.
Regarding parental acoustic speech modulations, the enhanced am-
plitude modulation in the slow stress rate could assist infants’ tracking
of speech by increasing rhythmic cues. Natural speech is not perfectly
regular. This lack of clear rhythm is a challenge for the synchronization
between neural activity and speech input. In adults, linguistic knowl-
edge can compensate for the lack of rhythm by top-down modulating
auditory activity via linguistic predictions ( Keitel et al., 2017; Meyer
et al., 2019; Rimmele et al., 2018; Ten Oever and Martin, 2021 ). Yet,
preverbal infants still lack the linguistic knowledge required for such
predictions. The enhancement of slow amplitude modulations in IDS
could compensate for this lack by providing additional acoustic cues
which aids tracking for the prosodic stress rate. A second possibility is
that IDS modulates tracking by increasing infants’ attention, possibly
via a combination of visual and acoustic cues. The typical acoustic cor-
relates of IDS were shown to increase infants’ attention compared to
ADS ( ManyBabies Consortium, 2020; Cooper and Aslin, 1990; Kaplan
6
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
et al., 1995; Roberts et al., 2013 ). Neural tracking is aected by attention
( Fuglsang et al., 2017 ) and reects the selection of relevant attended in-
formation ( Obleser and Kayser, 2019 ). Increased tracking of IDS in the
prosodic stress rate may thus reect 9-month-olds’ enhanced attention
to prosodic stress, which provides them with a relevant acoustic cue
aiding word segmentation. These two interpretations are not mutually
exclusive but may explain our ndings as a combination of enhanced
acoustic cues in maternal speech and increased attention of the infant
for prosodic stress in IDS.
One question that we cannot account for is whether the enhanced
synchronization between neural activity and IDS observed here results
from genuine entrainment of endogenous oscillations or from auditory-
evoked reponses (see Keitel et al., 2021 ). It has been suggested that
oscillations in the auditory cortex phase-lock to acoustic information in
a frequency specic manner ( Lakatos et al., 2013 ). In speech process-
ing, F0 amplitude rhythms might entrain neural oscillations in the delta
frequency ( Bourguignon et al., 2013 ). For our current results, this could
indicate that the amplitude edges or peaks in the prosodic stress rate of
IDS provide sucient rhythmic cues to allow for a phase-alignment of
oscillatory activity operating in the frequency range of prosodic stress.
Another possibility is that the exaggeration of prosodic stress in IDS
leads to a series of evoked responses that are superimposed on neural
activity and thus appear in the same frequency band as the prosodic
stress rate. Our results are compatible with both explanations, therefore
future work is required to distinguish these two accounts for infants’
processing of IDS. Since both possbilities result in increased neural pro-
cessing of acoustic information in the prosodic stress rate in IDS, they
are also both compatible with our interpretation that tracking facilitates
infants’ word segmentation from continuous IDS.
Our study provides further evidence for the previously proposed im-
portance of prosody in assisting speech processing. This is especially
relevant in light of healthy parent-infant interactions given evidence
that clinically depressed mothers show less IDS, potentially impacting
children’s language development ( Lam-Cassettari and Kohlho, 2020;
Liu et al., 2017; Stein et al., 2008 ). In healthy parent-infant interac-
tions, IDS may be optimally adapted to infants’ needs during language
development (see Kalashnikova and Burnham, 2018 ). As infants grow
older, the amount of parents’ IDS decreases and changes its acous-
tic characteristics ( Kitamura and Burnham, 2003; Raneri et al., 2020 ).
Leong et al. (2017) showed that the enhancement of prosodic amplitude
modulations in IDS decreases when mothers are talking to older infants.
These changes in IDS may be tied to infants’ increased linguistic knowl-
edge, as parents were shown to use more prototypically infant-directed
speech when talking to infants with lower language abilities ( Bohannon
and Marquis, 1977; Kalashnikova et al., 2020; Reissland and Stephen-
son, 1999 ). Importantly, speech tracking was shown to increase with
linguistic knowledge ( Chen et al., 2020; Choi et al., 2020 ), meaning that
infants’ tracking may rely less on acoustic cues in IDS as their linguistic
knowledge increases. This implies that parents adapt the acoustic prop-
erties of their speech to their infants’ language development to allow
for a level of tracking that is optimal for the infants’ current language
status. Future studies need to evaluate the interactions between parents’
speech adaptations and infants’ linguistic knowledge on infants’ track-
ing of speech. The current study contributes an empirical foundation for
such future investigations, by showing that neural tracking is sensitive
to parents’ speech adaptations during natural interactions, likely facili-
tating higher-level inferential processes such as word segmentation. This
makes tracking a potential neural mechanism for infants’ word segmen-
tation from continuous speech.
Data and Code Availability Statement
Data availability
The conditions of our ethics approval do not permit public archiving
of participant data. Readers seeking access to the data should contact
the corresponding author to arrange a formal data sharing agreement.
Code availability
Preprocessing of the EEG data was done using the publicly
available HAPPE pipeline V1 (DOI: 10.3389/fnins.2018.00097;
download: https://github.com/lcnhappe/happe ) in EEGLAB
v2019.1 (DOI: https://doi.org/10.1515/bmt-2013-4182 ; download:
https://sccn.ucsd.edu/eeglab/download.php ) and in eldtrip (ver-
sion from 20200521) (DOI: https://doi.org/10.1155/2011/156869 ;
download: https://www.eldtriptoolbox.org/download.php ). Custom
code was written for the computation of speech envelopes and Hilbert
coherence and will be made available if the article is accepted for
publication.
Declaration of Competing Interest
The authors declare that there is no conict of interest.
4.1. Funding
This work was supported by the Max Planck Society. The funders
had no role in the conceptualization, design, data collection, analysis,
decision to publish, or preparation of the manuscript.
Credit authorship contribution statement
Katharina H. Menn: Conceptualization, Formal analysis, Visualiza-
tion, Writing – original draft. Christine Michel: Conceptualization, In-
vestigation, Data curation, Writing –review & editing. Lars Meyer:
Conceptualization, Formal analysis, Writing – original draft, Supervi-
sion. Stefanie Hoehl: Conceptualization, Resources, Writing –review
& editing. Claudia Männel: Conceptualization, Supervision, Writing –
original draft.
Acknowledgments
We are grateful to the infants and parents who participated. We
thank Milena Marx, Katja Höhne, Ole Scholand, Johanna Bayón,
Leonie Grandpierre, Melanie Schwan, Annika Behlen and Ann Sophie
von Schwartzenberg for their assistance with data collection, Claudia
Geißler and Sophia Richter for their help with the speech processing,
Christina Münchberger, Luka Büttner and Florian Teichmann for coding
the eye gaze and Johanna Lieb for her assistance with the data prepara-
tion.
Supplementary material
Supplementary material associated with this article can be found, in
the online version, at doi: 10.1016/j.neuroimage.2022.118991
References
Adriaans, F. , Swingley, D. , 2017. Prosodic exaggeration within infant-directed speech:
consequences for vowel learnability. J. Acoust. Soc. Am. 141 (5), 3070–3078 .
Attaheri, A., Choisdealbha, Á.N., Di Liberto, G.M., Rocha, S., Brusini, P., Mead, N.,
Olawole-Scott, H., Boutris, P., Gibbon, S., Williams, I., et al., 2022. Delta-and theta-
band cortical tracking and phase-amplitude coupling to sung speech by infants. Neu-
roimage 118698. doi: 10.1016/j.neuroimage.2021.118698 .
Begus, K. , Southgate, V. , Gliga, T. , 2015. Neural mechanisms of infant learning: dier-
ences in frontal theta activity during object exploration modulate subsequent object
recognition. Biol. Lett. 11 (5), 20150041 .
Boersma, P., 2001. Praat, a system for doing phonetics by computer. Glot International 5
(9), 341–345. https://hdl.handle.net/11245/1.200596 .
Bohannon, I., Marquis, A.L., 1977. Children’S control of adult speech. Child Dev 1002–
1008. doi: 10.2307/1128352 .
Bourguignon, M., Baart, M., Kapnoula, E.C., Molinaro, N., 2020. Lip-reading enables the
brain to synthesize auditory
features of unknown silent speech. J. Neurosci. 40 (5),
1053–1065. doi: 10.1523/JNEUROSCI.1101-19.2019 .
Bourguignon, M., De Tiege, X., De Beeck, M.O., Ligot, N., Paquier, P., Van Bogaert, P.,
Goldman, S., Hari, R., Jousmäki, V., 2013. The pace of prosodic phrasing cou-
ples the listener’s cortex to the reader’s voice. Hum Brain Mapp 34 (2), 314–326.
doi: 10.1002/hbm.21442 .
7
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Castellanos, N.P., Makarov, V.A., 2006. Recovering eeg brain signals: artifact suppression
with wavelet enhanced independent component analysis. J. Neurosci. Methods 158
(2), 300–312. doi: 10.1016/j.jneumeth.2006.05.033 .
Çetinçelik, M., Rowland, C.F., Snijders, T.M., 2020. Do the eyes have it? a system-
atic review on the role of eye gaze in infant language development. Front Psychol
doi: 10.3389/fpsyg.2020.589096 .
Chandrasekaran, C. , Trubanova, A. , Stillittano, S. , Caplier, A. , Ghazanfar, A.A. , 2009. The
natural statistics of audiovisual speech. PLoS Comput. Biol. 5 (7) .
Chen, Y., Jin, P., Ding, N., 2020. The inuence of linguistic informa-
tion on cortical tracking of words. Neuropsychologia 148, 107640.
doi: 10.1016/j.neuropsychologia.2020.107640 .
de Cheveigné, A., 2020. Zapline: a simple and eective method to remove power line
artifacts. Neuroimage 207, 116356. doi: 10.1016/j.neuroimage.2019.116356 .
Choi, D., Batterink, L.J., Black, A.K., Paller, K.A., Werker, J.F., 2020. Preverbal infants
discover statistical word patterns at similar rates as adults: evidence from neural en-
trainment. Psychol Sci 31 (9), 1161–1173. doi: 10.1177/0956797620933237 .
ManyBabies Consortium, 2020. Quantifying sources of variability in infancy research us-
ing the infant-directed-Speech preference. Advances in Methods and Practices in Psy-
chological Science 3 (1), 24–52. doi: 10.1177/2515245919900809 .
Cooper, R.P., Aslin, R.N., 1990. Preference for
infant-directed speech in the rst month
after birth. Child Dev 61 (5), 1584–1595. doi: 10.1111/j.1467-8624.1990.tb02885.x .
Cristia, A., 2013. Input to language: the phonetics and perception of infant-directed
speech. Linguistics and Language Compass 7 (3), 157–170. doi: 10.1111/lnc3.12015 .
Crosse, M.J., Butler, J.S., Lalor, E.C., 2015. Congruent visual speech enhances cortical
entrainment to continuous auditory speech in noise-free conditions. J. Neurosci. 35
(42), 14195–14204. doi: 10.1523/JNEUROSCI.1829-15.2015 .
Csibra, G. , Gergely, G. , 2006. Social learning and social cognition: the case for pedagogy.
Processes of change in brain and cognitive development. Attention and performance
XXI 21, 249–274 .
Csibra, G., Gergely,
G., 2009. Natural pedagogy. Trends Cogn. Sci. (Regul. Ed.) 13 (4),
148–153. doi: 10.1016/j.tics.2009.01.005 .
Cutler, A., Carter, D.M., 1987. The predominance of strong initial syllables in
the english vocabulary. Computer Speech & Language 2 (3–4), 133–142.
doi: 10.1016/0885-2308(87)90004-0 .
Delorme, A., Makeig, S., 2004. Eeglab: an open source toolbox for analysis of single-trial
eeg dynamics including independent component analysis. J. Neurosci. Methods 134
(1), 9–21. doi: 10.1016/j.jneumeth.2003.10.009 .
Doelling, K.B., Arnal, L.H., Ghitza, O., Poeppel, D., 2014. Acoustic landmarks drive delta–
theta oscillations to enable speech comprehension by facilitating perceptual parsing.
Neuroimage 85, 761–768. doi: 10.1016/j.neuroimage.2013.06.035 .
Fernald, A., Simon, T., 1984. Expanded intonation contours in mothers’ speech to new-
borns. Dev Psychol 20 (1), 104. doi: 10.1037/0012-1649.20.1.104 .
Fernald, A., Taeschner, T., Dunn, J., Papousek, M., de Boysson-Bardies, B., Fukui, I.,
1989. A cross-language study of prosodic modications in mothers’ and fa-
thers’ speech to preverbal infants. J Child Lang 16 (3), 477–501. doi: 10.1017/
S0305000900010679 .
Fuglsang, S.A., Dau, T., Hjortkjær, J., 2017. Noise-robust cortical tracking of
attended speech in real-world acoustic scenes. Neuroimage 156, 435–444.
doi: 10.1016/j.neuroimage.2017.04.026 .
Gabard-Durnam, L.J., Mendez Leal, A.S., Wilkinson, C.L., Levin, A.R., 2018. The harvard
automated processing pipeline for electroencephalography (happe): standardized pro-
cessing software for developmental and high-artifact data. Front Neurosci 12, 97.
doi: 10.3389/fnins.2018.00097 .
Goswami, U., 2019. Speech rhythm and language acquisition: an amplitude mod-
ulation phase hierarchy perspective. Ann. N. Y. Acad. Sci. 1453, 67–78.
doi: 10.1111/nyas.14137 .
Graf Estes, K., Hurley, K., 2013. Infant-directed prosody helps infants map sounds
to mean-
ings. Infancy 18 (5), 797–824. doi: 10.1111/infa.12006 .
Green, J.R., Nip, I.S.B., Wilson, E.M., Meerd, A.S., Yunusova, Y., 2010. Lip movement ex-
aggerations during infant-directed speech. Journal of Speech, Language, and Hearing
Research 53 (6), 1529–1542. doi: 10.1044/1092-4388(2010/09-0005) .
Grieser, D.L., Kuhl, P.K., 1988. Maternal speech to infants in a tonal language:
support for universal prosodic features in motherese. Dev Psychol 24 (1), 14.
doi: 10.1037/0012-1649.24.1.14 .
Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., Garrod, S., 2013.
Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS
Biol. 11 (12). doi: 10.1371/journal.pbio.1001752 .
Háden, G.P., Mády, K., Török, M., Winkler, I., 2020. Newborn infants dierently process
adult directed and infant directed speech. International Journal of Psychophysiology
147, 107–112. doi: 10.1016/j.ijpsycho.2019.10.011 .
Hoehl, S. , Michel, C. , Reid, V.M. , Parise, E. , Striano, T. , 2014. Eye contact during live social
interaction modulates infants’ oscillatory brain activity. Soc Neurosci 9 (3), 300–308 .
Jessen, S., Fiedler, L., Münte, T.F., Obleser, J., 2019. Quantifying the individual auditory
and visual brain response in 7-month-old infants watching a brief cartoon movie.
Neuroimage 202, 116060. doi: 10.1016/j.neuroimage.2019.116060 .
Jessen, S., Obleser, J., Tune, S., 2021. Neural tracking in infants–an analytical tool
for multisensory social processing in development. Dev Cogn Neurosci 101034.
doi: 10.1016/j.dcn.2021.101034 .
Junge, C. , Cutler, A. , Hagoort, P. , 2014. Successful word recognition by 10-month-olds
given continuous speech both at initial exposure and test. Infancy 19 (2), 179–193 .
Junge, C., Kooijman, V., Hagoort, P., Cutler, A., 2012. Rapid recognition at 10
months as a predictor of language development. Dev Sci 15 (4), 463–473.
doi: 10.1111/j.1467-7687.2012.1144.x .
Jusczyk, P.W., Houston, D.M., Newsome, M., 1999. The beginnings of word
segmentation in english-learning infants. Cogn Psychol 39 (3–4), 159–207.
doi: 10.1006/cogp.1999.0716 .
Kalashnikova, M., Burnham, D., 2018. Infant-directed speech from seven to nineteen
months has similar acoustic properties but dierent functions. J Child Lang 45 (5),
1035–1053. doi: 10.1017/S0305000917000629 .
Kalashnikova, M., Goswami, U., Burnham, D., 2020. Infant-directed speech to in-
fants at risk for dyslexia: a novel cross-dyad design. Infancy 25 (3), 286–303.
doi: 10.1111/infa.12329 .
Kalashnikova, M., Peter, V., Di Liberto, G.M., Lalor, E.C., Burnham, D., 2018. Infant-
directed speech facilitates seven-month-old infants’ cortical tracking of speech. Sci
Rep 8 (1), 1–8. doi: 10.1038/s41598-018-32150-6 .
Kaplan, P.S., Goldstein, M.H., Huckeby, E.R., Owren, M.J., Cooper, R.P., 1995. Dishabit-
uation of visual attention
by infant-versus adult-directed speech: eects of frequency
modulation and spectral composition. Infant Behavior and Development 18 (2), 209–
223. doi: 10.1016/0163-6383(95)90050-0 .
Katz, G.S., Cohn, J.F., Moore, C.A., 1996. A combination of vocal f0 dynamic and summary
features discriminates between three pragmatic categories of infant-directed speech.
Child Dev 67 (1), 205–217.
doi: 10.1111/j.1467-8624.1996.tb01729.x .
Keitel, A., Ince, R.A., Gross, J., Kayser, C., 2017. Auditory cortical delta-entrainment in-
teracts with oscillatory power in multiple fronto-parietal networks. Neuroimage 147,
32–42. doi: 10.1016/j.neuroimage.2016.11.062 .
Keitel, C. , Obleser, J. , Jessen, S. , Henry, M.J. , 2021. Frequency-specic eects in infant
electroencephalograms do not require entrained neural oscillations: acommentary on
köster et al.(2019). Psychol Sci . 09567976211001317
Kitamura, C., Burnham, D., 2003. Pitch and communicative intent in mother’s
speech: adjustments for age and sex in the rst year. Infancy 4 (1), 85–110.
doi: 10.1207/S15327078IN0401_5 .
Kooijman, V. , Hagoort, P. , Cutler, A. , 2009.
Prosodic structure in early word segmentation:
ERP evidence from dutch ten-month-olds. Infancy 14 (6), 591–612 .
Kooijman, V. , Junge, C. , Johnson, E.K. , Hagoort, P. , Cutler, A. , 2013. Predictive brain
signals of linguistic development. Front Psychol 4, 25 .
Lakatos, P., Musacchia, G., O’Connel, M.N., Falchier, A.Y., Javitt, D.C.,
Schroeder, C.E., 2013. The spectrotemporal lter mechanism of auditory se-
lective attention. Neuron 77 (4), 750–761. doi: 10.1016/j.neuron.2012.11.034 .
https://www.sciencedirect.com/science/article/pii/S0896627312011270
Lam-Cassettari, C. , Kohlho, J. , 2020. Eect of maternal depression on infant-directed
speech to prelinguistic infants: implications for language development. PLoS ONE 15
(7), e0236787 .
Leong,
V., Goswami, U., 2015. Acoustic-emergent phonology in the amplitude envelope of
child-directed speech. PLoS ONE 10 (12), 1–37. doi: 10.1371/journal.pone.0144411 .
Leong, V., Kalashnikova, M., Burnham, D., Goswami, U., 2017. The temporal
modulation mtructure of infant-Directed speech. Open Mind 1 (2), 78–90.
doi: 10.1162/OPMI_a_00008 .
Liu, Y. , Kaaya, S. , Chai, J. , McCoy, D. , Surkan, P. , Black, M. , Sutter-Dallay, A.-L. , Ver-
doux, H. , Smith-Fawzi, M. , 2017. Maternal depressive symptoms and early childhood
cognitive development: a meta-analysis. Psychol Med 47 (4), 680–689 .
Lloyd-Fox, S. , Széplaki-Köll ő d, B. , Yin, J. ,
Csibra, G. , 2015. Are you talking to me? neu-
ral activations in 6-month-old infants in response to being addressed during natural
interactions. Cortex 70, 35–48 .
Männel, C., Friederici, A.D., 2013. Accentuate or repeat? brain signatures of de-
velopmental periods in infant word recognition. Cortex 49 (10), 2788–2798.
doi: 10.1016/j.cortex.2013.09.003 .
Martin, A., Igarashi, Y., Jincho, N., Mazuka, R., 2016. Utterances in infant-directed
speech are shorter, not slower. Cognition 156, 52–59. doi: 10.1016/j.cognition.2016.
07.015 .
Meyer, L., 2018. The neural oscillations of speech processing and language comprehen-
sion: state of the art and emerging mechanisms. European Journal of Neuroscience
48
(7), 2609–2621. doi: 10.1111/ejn.13748 .
Meyer, L., Sun, Y., Martin, A.E., 2019. Synchronous, but not entrained: exogenous and
endogenous cortical rhythms of speech and language processing. Lang Cogn Neurosci
0 (0), 1–11. doi: 10.1080/23273798.2019.1693050 .
Michel, C., Matthes, D., Hoehl, S., 2021. Neural and behavioral correlates of ostensive
cues in naturalistic mother-infant interactions. Manuscript in preparation.
Obleser, J., Kayser, C., 2019. Neural entrainment and attentional selection
in the listening brain. Trends Cogn. Sci. (Regul. Ed.) 23 (11), 913–926.
doi: 10.1016/j.tics.2019.08.004 .
Oostenveld, R., Fries, P., Maris, E., Schoelen, J.-M., 2011. Fieldtrip: open source software
for advanced analysis of meg, eeg, and
invasive electrophysiological data. Comput
Intell Neurosci 2011. doi: 10.1155/2011/156869 .
Ortiz Barajas, M.C., Guevara, R., Gervain, J., 2021. The origins and development of speech
envelope tracking during the rst months of life. Dev Cogn Neurosci 48, 100915.
doi: 10.1016/j.dcn.2021.100915 .
Parks, T. , McClellan, J. , 1972. Chebyshev approximation for nonrecursive digital lters
with linear phase. IEEE Transactions on Circuit Theory 19 (2), 189–194 .
Peelle, J.E., Gross, J., Davis, M.H., 2013. Phase-locked responses to speech in human audi-
tory cortex are enhanced during comprehension. Cerebral Cortex 23 (6), 1378–1387.
doi: 10.1093/cercor/bhs118 .
Power, A.J. , Mead, N. , Barnes, L.
, Goswami, U. , 2012. Neural entrainment to rhythmically
presented auditory, visual, and audio-visual speech in children. Front Psychol 3, 216 .
Ramírez-Esparza, N., García-Sierra, A., Kuhl, P.K., 2014. Look who’s talking: speech style
and social context in language input to infants are linked to concurrent and future
speech development. Dev Sci 17 (6), 880–891. doi: 10.1111/desc.12172 .
Raneri, D., Von Holzen, K., Newman, R., Bernstein Ratner, N., 2020. Change in maternal
speech rate to preverbal infants over the rst two years of life. J Child Lang 47 (6),
1263–1275. doi: 10.1017/S030500091900093X .
Reissland, N., Stephenson, T., 1999. Turn-taking in early
vocal interaction: a comparison
of premature and term infants’ vocal interaction with their mothers. Child Care Health
Dev 25 (6), 447–456. doi: 10.1046/j.1365-2214.1999.00109.x .
8
K.H. Menn, C. Michel, L. Meyer et al. NeuroImage 251 (2022) 118991
Rimmele, J.M., Morillon, B., Poeppel, D., Arnal, L.H., 2018. Proactive sensing of periodic
and aperiodic auditory patterns. Trends Cogn. Sci. (Regul. Ed.) 22 (10), 870–882.
doi: 10.1016/j.tics.2018.08.003 .
Roberts, S., Fyeld, R., Baibazarova, E., van Goozen, S., Culling, J.F., Hay, D.F., 2013.
Parental speech at 6 months predicts joint attention at 12 months. Infancy 18, E1–
E15. doi: 10.1111/infa.12018 .
Schreiner, M.S., Mani, N., 2017. Listen up! developmental dierences in
the impact of IDS on speech segmentation. Cognition 160, 98–102.
doi: 10.1016/j.cognition.2016.12.003 .
Singh, L., Nestor, S., Parikh, C., Yull, A., 2009. Inuences of infant-directed speech on
early word recognition. Infancy 14 (6), 654–666. doi: 10.1080/15250000903263973 .
Smith, N.A. , Strader, H.L. , 2014. Infant-directed visual prosody: Mothers’ head movements
and speech acoustics. g 15 (1), 38–54 .
Smith, S.M., Nichols, T.E., 2009. Threshold-free cluster enhancement: address-
ing problems of smoothing, threshold dependence and localisation in clus-
ter inference. Neuroimage 44 (1),
83–98. doi: 10.1016/j.neuroimage.2008.03.061 .
https://www.sciencedirect.com/science/article/pii/S1053811908002978 .
Smith, Z.M., Delgutte, B., Oxenham, A.J., 2002. Chimaeric sounds re-
veal dichotomies in auditory perception. Nature 416 (6876), 87
–90. doi: 10.1038/416087a .
Soderstrom, M., 2007. Beyond babytalk: re-evaluating the nature and content of
speech input to preverbal infants. Developmental Review 27 (4), 501–532.
doi: 10.1016/j.dr.2007.06.002 .
Soderstrom, M., Blossom, M., Foygel, R., Morgan, J.L., 2008. Acoustical cues and gram-
matical units in speech to two preverbal infants. J Child Lang 35 (4), 869–902.
doi: 10.1017/S0305000908008763 .
Spinelli, M., Fasolo, M., Mesman, J., 2017. Does prosody make the dierence? a meta-
analysis on relations between prosodic aspects of infant-directed speech and infant
outcomes. Developmental Review 44, 1–18. doi: 10.1016/j.dr.2016.12.001 .
Stärk, K., Kidd, E., Frost, R.L., 2021. Word segmentation cues in German child-directed
speech: a corpus analysis. Lang Speech (February) doi: 10.1177/0023830920979016 .
Stein, A. , Malmberg, L.-E. , Sylva, K. , Barnes, J. , Leach, P. , team, F. , 2008. The inuence of
maternal depression, caregiving, and socioeconomic status in the post-natal year on
children’s language development. Child Care Health Dev 34 (5), 603–612 .
Swerts,
M. , Krahmer, E. , 2010. Visual prosody of newsreaders: eects of information struc-
ture, emotional content and intended audience on facial expressions. J Phon 38 (2),
197–206 .
Ten Oever, S. , Martin, A.E. , 2021. An oscillating computational model can track pseu-
do-rhythmic speech by using linguistic predictions. eLife 10, e68066 .
Thiessen, E.D., Hill, E.A., Saran, J.R., 2005. Infant-directed speech facilitates word seg-
mentation. Infancy 7 (1), 53–71. doi: 10.1207/s15327078in0701_5 .
Weisleder, A., Fernald, A., 2013. Talking to children matters: early language experi-
ence strengthens processing and builds vocabulary. Psychol Sci 24 (11), 2143–2152.
doi: 10.1177/0956797613488145 .
Winkler,
I., Haufe, S., Tangermann, M., 2011. Automatic classication of artifactual ica-
components for artifact removal in eeg signals. Behavioral and Brain Functions 7 (1),
1–15. doi: 10.1186/1744-9081-7-30 .
Zangl, R. , Mills, D.L. , 2007. Increased brain activity to infant-directed speech in 6-and
13-month-old infants. Infancy 11 (1), 31–62 .
9