PreprintPDF Available

Variable pronunciations reveal dynamic intra-speaker variation in speech planning


Abstract and Figures

In two speech production experiments, we investigated the link between phonetic variation and the scope of advance planning at the word form encoding stage. We examined cases where a word has, in addition to the pronunciation of the word in isolation, a context-specific pronunciation variant that appears only when the following word includes specific sounds. To the extent that the speaker uses the variant specific to the following context, we can infer that the phonological content of the upcoming word is included in the current planning scope. We hypothesize that the time alignment between selection of the phonetic variant in the currently-being-encoded word and retrieval of segmental details of the upcoming word is variable from moment to moment depending on current task demands and the dynamics of lexical access for each word involved. The results showed that the use of a context-sensitive phonetic variant of /t/ (``flapping'') by English speakers reliably increased under conditions which favor advance planning. Our hypothesis was supported by evidence compatible with its three key predictions: an increase in flapping with higher frequency nouns, more flapping in a procedure with a response delay relative to a speeded response, and an attenuation of the noun frequency effect with delayed responses. In addition to revealing the dynamic scope of planning within individuals, this study has shown that contextually-conditioned phonetic variants can be used to index the planning of upcoming words at the word form level.
Content may be subject to copyright.
Short Title: Variable pronunciations reveal dynamic intra-speaker variation in speech planning
Variable pronunciations reveal dynamic intra-speaker variation in speech planning
Oriana Kilbourn Cerona* and Matthew Goldricka
aDepartment of Linguistics, Northwestern University, Evanston IL
*Corresponding author:
Oriana Kilbourn-Ceron
Department of Linguistics, Northwestern University
2016 Sheridan Rd.
Evanston, IL 60208 USA
Tel. (773) 357-6481
Keywords: Speech production; psycholinguistics; phonology
Funding sources: Fonds de recherche du Québec : Société et culture
In two speech production experiments, we investigated the link between phonetic variation and the
scope of advance planning at the word form encoding stage. We examined cases where a word has,
in addition to the pronunciation of the word in isolation, a context-specific pronunciation variant that
appears only when the following word includes specific sounds. To the extent that the speaker uses the
variant specific to the following context, we can infer that the phonological content of the upcoming word
is included in the current planning scope.
We hypothesize that the time alignment between selection of the phonetic variant in the currently-
being-encoded word and retrieval of segmental details of the upcoming word is variable from moment
to moment depending on current task demands and the dynamics of lexical access for each word in-
volved. The results showed that the use of a context-sensitive phonetic variant of /t/ (”flapping”) by
English speakers reliably increased under conditions which favor advance planning. Our hypothesis was
supported by evidence compatible with its three key predictions: an increase in flapping in phrases with
a higher frequency *following* word, more flapping in a procedure with a response delay relative to
a speeded response, and an attenuation of the following word frequency effect with delayed responses.
This reveals that within speakers, the degree of advance planning varies continuously from moment to
moment, reflecting (in part) the accessibility of form properties of individual words in the utterance.
During the production of speech, there is a tension between planning multiple words in advance, which
allows for fluent speech, and reducing the load on working memory, so as to alleviate interference from
non-initial words (which could lead to speech errors or delays; Ferreira & Swets, 2002; V. Wagner et al.,
2010). This paper investigates the factors which influence variation in planning scope within word form
encoding processes. Previous studies have found that word form processing can take place for multiple
words in advance prior to speech initiation (Oppermann et al., 2010; Schnur, 2011; Wynne et al., 2018,
among others), but the degree of advance planning may differ between tasks and between individuals (Michel
Lange & Laganaro, 2014; Schriefers & Teruel, 1999). This paper advances the proposal that the extent to
which multiple word forms are encoded in advance varies continuously within individuals as a function of
the processing time required by each word in the utterance and of the task conditions, which can require or
discourage rapid initiation of speech.
We test this hypothesis in two speech production experiments that measure the degree to which phonetic
outcomes of utterance-initial words, i.e., their pronunciations, are influenced by the phonological forms of
words that follow.
In our pre-registered experiments, we tested how word-specific properties (frequency and length) and
task-specific demands (speeded vs. delayed productions) affected phonetic outcomes in the production of
short phrases. In speeded productions there was increased use of the contextually-conditioned variant (i.e.,
grea[ɾ] artist) for phrases with higher frequency words. When a response delay was imposed – allowing more
planning time – there was overall more use of the contextually-conditioned variant, but also a significant
reduction in the effect of the second word’s frequency, suggesting this frequency effect is specifically linked
to advance planning of the second word. This provides new insights in to the dynamic nature of advance
planning during word form encoding, showing that the extent of advance planning varies not only between
speakers or tasks, but from moment to moment within speakers.
Word form encoding
Word form encoding is the process of mapping a grammatical representation to its corresponding sensori-
motor representation which is used to generate speech movements. In psycholinguistic theories of speech
production, word form encoding is typically assumed to require at least two stages:1phonological encoding
1It should be noted that research on speech planning is based in large part on speakers of Western European languages. As
the comparative work of O’Seaghdha et al. (2010) demonstrates, speech planning may differ substantially for speakers of other
languages in ways that are challenging to account for in current speech planning theories. The authors acknowledge that these
and phonetic encoding. Phonological encoding involves the retrieval of the segmental content associated
with selected words, construction of a prosodic frame, and association of segments to positions in the frame
(see Goldrick, 2014, for a review). This frame includes minimally the syllabic level, which is organized into
prosodic word groupings (a prosodic word is typically made up of a single content word plus surrounding
unstressed function words; Wheeldon & Lahiri, 1997). Phonetic encoding begins once segments are associ-
ated to prosodic positions, then contextual adjustments based on syllable structure and phonological context
(e.g., flapping in English) are specified in a phonetic representation (with both discrete and gradient aspects),
which in turn serves as the basis for articulatory processing (see Buchwald, 2014, for a review).
Given that each word goes through several sub-stages of encoding, the question arises as to how the
sub-stages of subsequent words are temporally aligned when a speaker must plan several words at once, as
in typical spontaneous speech. One source of evidence comes from the contextual adjustments implemented
during word form encoding, since the adjustments are often influenced by the phonology of surrounding
words. For example in many varieties of English, the final /t/ sound in the word write is pronounced with
a shorter, voiced articulation called flap [ɾ] when it is followed by a vowel, as in wri[ɾ] a letter, but with
an affricate [tʃ] in wri[tʃ] you a letter (De Jong, 1998). Since these variants are only used in those specific
phonologically-defined contexts (i.e., followed by a vowel or palatal glide), it must be the case that upcoming
words (e.g., aand you in the above examples) have had their word forms activated sufficiently to provide a
viable context for the selection of the wri[ɾ] or wri[tʃ] variants. However, the degree to which multiple word
forms can or must be planned in advance is a subject of ongoing investigation.
Studies of prepared speech have shown that speakers can engage with multiple word forms prior to
speech onset. The time to initiate prepared speech grows linearly with the number of phonological words,
suggesting that prosodic frames for multiple words can be prepared in advance ((Ferreira, 1993; Sternberg et
al., 1978; Wheeldon & Lahiri, 1997, 2002; Wynne et al., 2018)). However, these same studies also suggest
that increasing the number of syllables only matters if they are added in the first word, suggesting that unlike
prosodic words, speakers do not plan syllabic structure multiple words in advance (Sternberg et al., 1978;
Wheeldon & Lahiri, 2002; Wynne et al., 2018). This contrast highlights the need to carefully distinguish
between distinct sub-stages of word form encoding when investigating how far in advance speakers are able
to plan.
issues limit the generalizatibility of the findings reviewed here, and that the same caveat should be applied to the results of the
current study.
Studies have also investigated whether the segmental content of non-utterance-initial words is activated
prior to speech onset. Damian & Dumay (2009) used repetition priming to probe the timing of noun’s
phonological processing in adjective-noun phrases. Pictures described via phrases in which words shared
a segment were named faster than pictures with descriptions which didn’t (e.g., green goat named faster
than green chair). This implies that the segmental content of the second word is to some degree activated
prior to speech onset. The conceptually related phonological consistency paradigm has been used to pro-
vide additional evidence that segmental content of multiple words is co-activated prior to speech onset. In
this paradigm, researchers measure reaction times for determiner-adjective-noun phrases in which the de-
terminer’s pronunciation depends on the sound that follows (e.g. a/an in English). It has been reported
that phrases with “matching” adjective and noun (e.g. a purple giraffe, cf. a giraffe) have initiation times
compared to mismatching phrases like a purple elephant (cf. an elephant) (Spalek et al., 2010). This phono-
logical consistency effect has been replicated with determiners in other Romance and Germanic languages
(Alario & Caramazza, 2002; Bürki et al., 2014, 2015, 2016; Miozzo & Caramazza, 1999). This phenomenon
suggests that the specification of a determiner’s form takes place at a moment when there is substantial co-
activation of the segmental content of multiple subsequent words.
Studies using the picture-word interference (PWI) paradigm have shown that when naming picture dis-
plays with full sentences, speakers’ initiation times are affected by distractors which are phonologically re-
lated to non-utterance-initial words (Oppermann et al., 2010; Schnur et al., 2006; Schnur, 2011). Other PWI
studies which elicited simple or conjoined noun phrases (e.g., the red mouse, or the arrow and the bag), have
found no phonological priming effects beyond the first prosodic word in the phrase (Meyer, 1996; Michel
Lange & Laganaro, 2014; Schriefers, 1999). However, post-hoc analyses that group participants based on
performance in the experiment suggest that later-occurring syllables can produce priming for participants
that are relatively more accurate (Schriefers, 1999) or respond relatively slowly (Michel Lange & Laganaro,
2014). This suggests that different possibilities are available as to how many words are phonologically
activated prior to speech onset.
In contrast, studies of coarticulation in single word production have shown that speakers are capable of
initiating articulation with very little prepared information (Kawamoto et al., 2015; Liu et al., 2018; Whalen,
1990). This work suggests that under certain task conditions speakers can plan for and launch articulation
in the absence of phonological details about upcoming words, syllables, or even the next segment.
Studies examining spontaneous speech show that it is quite common for speakers to vary in their use of
context-specific variants, even when the phonological environment remains constant (Pierrehumbert, 2001).
A link has been proposed between this phenomenon and the variation in advance planning (Kilbourn-Ceron,
2017; Tamminga et al., 2016; Tanner et al., 2017; M. Wagner, 2012). Kilbourn-Ceron et al. (2020) found that
for a given pair of words A–B, the context-specific variant of word A is much more likely to be used when
word B is predictable from word A. They argue that the predictability of word B allows it to be encoded sooner
relative to word A, therefore increasing the extent of advance planning. However, the link between word-
specific variables like lexical frequency and the extent of advance planning has not yet been investigated in
a controlled experiment, and is the subject of the present study.
The present study
This study investigates the scope of speech planning by measuring the use of flap as a variant of a word-final
/t/ in adjective-noun phrases, e.g., great artist, under conditions that facilitate or delay planning. Since the
flap variant of /t/ only appears if a vowel follows, its presence serves as a diagnostic for simultaneous encod-
ing of both words in the phrase. We hypothesize that the likelihood of simultaneous word form encoding for
any given utterance depends on the joint influence of task demands and the planning load imposed by indi-
vidual words in the utterance. This predicts that speakers should use fewer flaps in a task which encourages
highly incremental planning, and more flaps when the task favors advance planning. We test this prediction
across participants by comparing phrases produced in a speeded or delayed response procedure.
Our hypothesis predicts that when the word forms of nouns take longer to retrieve, vowel-initial nouns
will be less likely to license the use of a flap on the preceding adjective. Our experiments test this prediction
by pairing adjectives with three nouns of varying lexical frequency, a variable which is well-known to affect
reaction times in picture-naming with single words (Oldfield & Wingfield, 1964; Jescheniak & Levelt, 1994),
and is significantly correlated with pre-speech gaze times (Levelt & Meyer, 2000). Our hypothesis predicts
that flapping will be less likely in a phrase with a low frequency noun, e.g., great oyster, compared to a phrase
with a high frequency noun, e.g., great artist. The precise locus or loci of lexical frequency effects is still
being debated, and it is likely that distinct frequency effects arise at multiple levels of processing, ranging
from lexical selection and phonological encoding (Kittredge et al., 2008) to phonetic encoding (Buchwald,
2014). What is crucial for the manipulation here is that at least some of these word-frequency-related delays
arise before phonetic encoding of the adjective is complete (and all context-related adjustments to the ad-
jective form have been specified). Because the phonological and/or phonetic properties of lower frequency
words take longer to retrieve/specify than those of high frequency words, they are less likely to influence
phonetic encoding of the adjective.
The magnitude of the frequency effect is predicted to vary as a function of task demands. In the speeded
condition, we expect that noun frequency will have a significant effect on the use of flaps, whereas as in the
delayed condition the effect will be reduced, since there is more time to retrieve the noun’s word form before
speaking begins.
Experiment 1
The methods, design, predictions and analysis plan for Experiment 1 were pre-registered prior to collection
of data. Deviations from the pre-registered plan are noted at relevant points. The reaction time analyses
were not pre-registered, but are reported for comparability with prior work. Pre-registration documents are
available on this project’s OSF page at
Sample size
The target number of participants was set at 50 based on a Monte Carlo power analysis. Using effect size and
variance estimates from a previous study based on spontaneous speech of the Midlands American variety
(Kilbourn-Ceron et al., 2016), simulated data sets were generated, varying the number of simulated partici-
pants (1000 simulations for each number). A mixed-effects logistic regression model was fit to each set of
simulated data, and a likelihood ratio test assessed whether a significant Noun Frequency effect of magnitude
0.24 could be detected (non-convergent simulations were discarded without replacement). Based on these
simulations, power exceeded 0.8 at 50 participants. Code and results of the power analysis are available on
the OSF page.
Young adults (mean age = 19.6) were recruited through the Northwestern University Linguistics department
subject pool (compensated by course credit) or recruited from the Northwestern community using flyers
(compensated by $7). Participants were recruited until a total of 50 met the following inclusion criteria:
self-reported learning English starting before age 1, no uncorrected vision or hearing impairment, and spoke
a variety of English with a productive flapping process. This last criterion was verified by the experimenter
during reading of the practice items, which included non-variable flapping contexts (e.g., /t/ in writer, which
is rarely pronounced without a flap in the target varieties of English). Most participants reported learning
English in the United States, one in Australia, and two in India. Participants’ self-reported reading profi-
ciency in English on a 10 point scale varied between 8 and 10 (mean = 9.52), and 34 reported knowing a
language other than English (2 declined to answer).
The phrases used for this experiment were constructed with several considerations in mind. The adjectives
for the critical items were selected from a range of frequencies, and adjective frequency was included as a
covariate in the statistical analysis. Assuming that the retrieval of word forms is initiated sequentially, as
suggested by, e.g., Alario et al. (2002); Levelt & Meyer (2000), the frequency of the adjective would affect
how quickly planning of the noun can begin, and potentially modulate whether noun frequency itself could
have an effect. The adjectives also varied in length between one and three syllables. Previous work on dual
picture naming suggests that the time alignment between initiation of articulation and gaze to the second
object, which indexes planning of the second object’s name, differs depending on whether the first object
name is one or three syllables long (Griffin, 2003; Meyer et al., 2007). Meyer et al. (2007) proposes that
this is because when the first word is longer, speakers have additional time during the articulation of the
first word to plan the second word. Therefore in our experiments, we might expect that longer adjectives
will be more likely to use the flap variant. All nouns were two syllables long, but varied in stress pattern.
Nouns with unstressed initial syllables are expected to have higher flapping rates (De Jong, 1998). Therefore,
following a reviewer’s suggestion, noun stress was added as a covariate in the statistical model, despite not
being included in the pre-registered analysis plan.
As a final key control to isolate effects of advance planning, the phrases were constructed so as avoid
existing common phrases. Previous work suggests that frequently used phrases may be stored in memory as
single unit (Bybee, 2002) and therefore may have idiosyncratic phonetic realizations.
Items were prepared by selecting 40 adjectives ending in /t/ from the SUBTLEX-US database (Brysbaert
& New, 2009), spanning a range of lexical frequencies (Zipf values 2between 1.8 and 5.9, M= 3.7,SD =
2The Zipf scale is equivalent to log(frequency per million words)+3, proposed by Van Heuven et al. (2014). Words on this scale
are normally distributed between about 1 and 7, making it intuitively easy to compare. For reference, 1 Zipf corresponds to 0.01
frequency per million words, 3 corresponds to 1 per million, and 6 corresponds to 1000 per million.
1.1). The length of the adjective varied between one and three syllables (15 one-syllable, 16 two-syllable,
and 9 three-syllable adjectives). The one-syllable adjectives had a higher mean frequency (M= 4.1), but
this is controlled for in the statistical analysis.
Each adjective was then paired with a unique low (Zipf value M= 2.3,SD = 0.4), medium (Zipf value
M= 3.4,SD = 0.3), and high frequency (Zipf value M= 4.5,SD = 0.3) vowel-initial noun, plus a
consonant-initial noun as a distractor.
These adjective-noun bigrams were either unattested or extremely low frequency in the Google N-gram
corpus (Michel et al., 2011). This yielded a total of 120 critical bigrams (3 per item). Note that frequency
bins were used only during item preparation, and continuous values were used for statistical analysis. The
full list of items is given in Appendix A.
Design and Procedure
Participants first saw 10 practice trials with unrelated items. Then, adjectives were presented once per
block paired with one of their four corresponding nouns. The proportion of high/medium/low frequency
and consonant-initial nouns was balanced across 4 lists, so each block consisted of 10 high frequency nouns,
10 medium frequency, 10 low frequency and 10 consonant-initial, plus 10 non-flapping fillers for a total
of 50 trials per block. Each list was presented in a random order, then presented again in a random order.
Within blocks, item order was also randomized. Participants were permitted to take breaks between blocks
for as long as needed.
After providing informed consent, participants completed a language background questionnaire. Then,
participants sat in a sound-proof room at a comfortable reading distance from a computer display. Instructions
and stimuli were displayed in 36pt white font on a black background. Written and verbal instructions were
given to participants asking them to read aloud the phrases on the screen as quickly as possible once they
appeared. Each trial began with a white fixation dot presented in the center of the screen for a randomly
selected interval of 250 msec, 500 msec, 750 msec, or 1000 msec. The interval was varied in order to
prevent participants from falling into a repetitive, list-like intonation, which was observed during piloting
(pilot participants were not included in the analysis). The stimulus was then presented for 1500 msec, and
was followed by a blank screen for 500 msec. Then the next trial started. The experiment was run using the
open-source software OpenSesame (Mathôt et al., 2012), and the code used to run the experiment is available
on the OSF page.
Acoustic Analysis
Sound files were automatically segmented into individual trials and aligned with orthographic transcriptions.
Phone-level alignments were generated with the Montreal Forced Aligner (McAuliffe et al., 2017, v 1.0.0)
using the English acoustic model provided with the software. The portion of each trial corresponding to
an adjective-final /t/ phone was analyzed using a custom Praat script based on the method described in
Eager (2015). In addition to percentage of voicing during the /t/ interval, the following acoustic measures
were extracted: duration of the adjective, noun, and vowels surrounding the /t/, based on the force-aligned
intervals; reaction time based on the interval between presentation of the go signal (same as stimulus onset);
and the presence of a pause between the adjective and noun, determined by whether a “silence” phone was
inserted by the forced alignment algorithm (minimally 30 msec due to the size of the analysis window).
The dependent measure of whether or not the speaker used a flap was based on the percentage of voicing
during closure, which is one of the main variables that contributes to perception of a flap (De Jong, 1998)3.
The optimal cut-off point was selected by comparing classification performance on the basis of annotations
prepared by OKC. Data for 13 participants was annotated during data collection, yielding 2312 annotated
tokens. In order to quantify the reliability of the percentage of voicing as an indicator of flapping, we assess
the classification accuracy of 3 discretized versions of the voicing measure, with cut-offs at >=50%, >=90%,
and 100%. The best overall performance comes from setting the cut-off at >=90% voicing, which yields a
balanced accuracy score of 0.88, with a sensitivity of 0.88 and specificity of 0.89. These rates are comparable
to inter-annotator reliability reported in previous studies (Raymond et al., 2002). Annotations and analysis
scripts are available on the OSF page.
Data Exclusions
The total number of trials collected from qualifying participants was 12000. Trials in which the participant
said the wrong word, restarted, or pronounced the target phrase incorrectly were excluded (n= 128), as were
trials in which automatic detection of reaction time or voicing failed (n= 31). We adopted two additional
exclusion criteria which were not pre-registered. From the subset with errors removed, participants’ mean
response times were calculated, and responses that were ±3 SDs away from the participants’ mean were
discarded (n= 129). Finally, we discarded trials in which the participant paused between the adjective
3While our discussion focuses on the discrete, categorical aspects of variation, there are also gradient aspects (De Jong, 1998),
some of which are likely encoded within phonetic processing.
cute iceberg
cute iceberg
Figure 1: Waveform (top), spectrogram (center) and automatic phone-level alignments (bottom) illustrating
acoustic profiles of different possible /t/ realizations from the same speaker saying the same phrase. A
flap is shown on the left, with periodic vocal fold vibration throughout, and more isolation-like glottal stop
pronunciation on the right, with aperiodic vibration preceding a short silence before onset of the noun.
and the noun (n= 1731). This is because flapping almost never occurs before a pause in English. Of the
2312 tokens annotated by hand, 238 were subsequently determined to have a pause, and only one had been
perceived as a flap by the annotator. In total, these exclusions resulted in a loss of 16.82% of the data, leaving
9982 observations for analysis.
Statistical model
Flapping and reaction times were modeled mixed-effects regressions, implemented using the lme4 (Bates
et al., 2013, v. 1.1-23) package in R(R Core Team, 2013, v. 1.3.959). An R Notebook detailing model
specifications and outputs is available on the OSF page.
Reaction times were analyzed with a linear regression with the response variable in milliseconds, log-
transformed to approach normality. Flapping was analyzed with a logistic regression, with a criterion of over
90% voicing during the aligned /t/ interval to be counted as a flap, represented with a response value of 1.
Fixed effects included adjective and noun frequency values on the Zipf scale (centered by subtracting
3.5). Since previous work suggests that length may affect the time course of inter-word phonological and
articulatory planning (Meyer et al., 2007), we included adjective length in syllables (centered by subtracting
2). Noun stress, which had not been pre-registered, was included as a sum-coded variable, with the positive
value (0.5) indicating initial stress, and the negative one (0.5) indicating final stress. Interaction terms were
also included in the model, but had not been pre-registered. Given that facilitation of adjective planning could
itself allow earlier planning of the noun, we included two- and three-way interactions of length, adjective
frequency, and noun frequency4.
Additionally, block number (1 through 8, centered by subtracting 4) was included as a control variable.
In the model for flapping, an additional (not pre-registered) control variable was speech rate (phones per
second, excluding the interval corresponding to /t/). Speech rate was z-score normalized within speaker.
The random effects structure included random intercepts for participants and items, random slopes for
noun frequency by item, and adjective frequency and noun frequency by participant. The flapping model also
had a by-participant random slope for adjective length. Correlations between the random effects terms were
dropped since including them in the model specification yielded a singular fit. Inclusion of random slopes
for the interactions between noun frequency, adjective frequency, and noun frequency did not significantly
4In additional analyses, we fit a model of flapping which also included subject-level and trial-level reaction time measures
(following Buz & Jaeger, 2016; Fink et al., 2018; Goldrick et al., 2019), but there were no significant effects, so reaction time
predictors are excluded from the models reported below.
Table 1: Fixed-effects estimates for linear regression of reaction times in Experiment 1. P-values are es-
timated with Satterthwaite’s degrees of freedom method from the lmerTest package (Kuznetsova et al.,
2017). Random effects are reported on the OSF page
βse(β) df t-value Pr(|t|)
(Intercept) 2.79100 0.00750 60.93 374.54 < 0.001
Adjective Frequency -0.01300 0.00260 40.25 -4.90 < 0.001
Noun Frequency -0.00530 0.00130 46.93 -4.20 < 0.001
Adjective Length 0.01200 0.00340 36.17 3.40 0.002
Block Number -0.00300 0.00025 9803.61 -12.00 < 0.001
Adjective*Noun Frequency -0.00034 0.00110 34.08 -0.32 0.752
Adjective Frequency*Length -0.00690 0.00310 36.34 -2.20 0.034
Noun Frequency*Adjective Length 0.00100 0.00150 35.97 0.67 0.506
Adj Freq*Noun Freq*Adj Length 0.00230 0.00130 32.73 1.71 0.096
increase goodness of fit, so they were excluded, as recommended by the selection procedure outlined in
Bates et al. (2015).
Reaction time
Full regression results are shown in Table 1. Consistent with previous work on single-word (Griffin & Bock,
1998; Jescheniak & Levelt, 1994; Oldfield & Wingfield, 1964) and multi-word utterances (Alario et al.,
2002; Konopka, 2012), reaction times were significantly faster when the phrase-initial adjective was higher
in frequency ( ˆ
β=0.013, p < 0.001). Replicating Alario et al. (2002), reaction times were also faster
when the noun was higher in frequency, with the effect about half the size of the adjective frequency effect
β=0.0053, p < 0.001).
There was an increase in reaction times for phrases beginning with longer adjectives ( ˆ
β= 0.012, p =
0.002), consistent with previous work (Sternberg et al., 1978; Wheeldon & Lahiri, 1997; Wynne et al.,
2018). Length significantly interacted with adjective frequency ( ˆ
β=0.0069, p = 0.034), reflecting an en-
hancement of the frequency effect for long adjectives. No other interactions reached statistical significance
(ts < 2).
Table 2: Fixed-effects estimates for logistic regression model of flap use in Experiment 1. P-values are
estimated with Satterthwaite’s degrees of freedom method from the lmerTest package (Kuznetsova et al.,
2017). Random effects are reported on the OSF page.
(Intercept) -2.0000 0.210 -9.60 < 0.001
Adjective Frequency 0.2600 0.086 3.07 0.002
Noun Frequency 0.1600 0.045 3.63 < 0.001
Adjective Length 0.2900 0.130 2.21 0.027
Block Number 0.1900 0.014 13.53 < 0.001
Speech Rate 0.6600 0.084 7.92 < 0.001
Noun Initial Stress -0.9400 0.100 -9.10 < 0.001
Adjective*Noun Frequency 0.0820 0.038 2.15 0.031
Adjective Frequency*Length -0.1500 0.100 -1.40 0.15
Noun Frequency*Adjective Length 0.0091 0.055 0.17 0.868
Adj Freq*Noun Freq*Adj Length -0.0350 0.047 -0.75 0.452
Full regression results are shown in Table 2. As illustrated in Figure 2, higher frequency nouns were signif-
icantly more likely to appear with a flap ( ˆ
β= 0.16, p < 0.001), consistent with our prediction of a noun
frequency effect. Flapping was also more likely with high frequency adjectives ( ˆ
β= 0.26, p = 0.002). As
shown in Figure 4 these two factors interacted ( ˆ
β= 0.082, p = 0.031), such that noun frequency effects
were strongest when the adjective was also frequent.
There was a significant effect of adjective length ( ˆ
β= 0.29, p = 0.027), and no interactions with length
were significant. As for the control covariates, flapping was more likely in later blocks ( ˆ
β= 0.19, p <
0.001) and at faster speaking rates ( ˆ
β= 0.66, p < 0.001), and less likely when the noun had initial stress
β=0.94, p < 0.001).
Our examination of phonetic variation provides new evidence that the advance planning of the noun form
(specifically, the initial vowel) is variable within a speaker; advance planning of the noun form is more likely
as its frequency increases.
Adjective frequency also facilitated planning, working in concert with noun frequency. The interaction
between adjective and noun frequency points towards the sequential nature of word form encoding: if noun
Experiment 1
Experiment 2
2345 2345
Noun Frequency (Zipf scale)
Likelihood of Flapping
Figure 2: Empirical plot of relationship between flapping rate (discretized) and noun frequency (Zipf scale)
in Experiment 1 (speeded, online responses) and Experiment 2 (delayed responses). Line represents the esti-
mated probability from a univariate logistic regression model, and shading shows 95% confidence intervals.
Each gray point represents the mean for a unique critical bigram.
Figure 3: Comparison of noun frequency effect size in Experiment 1 (Speeded) and Experiment 2 (Delayed).
Bar height represents the estimated fixed effect size, and error bars represent 95% confidence intervals based
on the subject-level variance of random slope for noun frequency. Individual gray points show subject
random slope estimates.
Experiment 1
Experiment 2
2345 2345
Noun Frequency (Zipf scale)
Likelihood of Flapping
Frequency Low Frequency High Frequency
Figure 4: Empirical plot of relationship between flapping rate (discretized) and noun frequency (Zipf scale)
in Experiment 1 (speeded, online responses) and Experiment 2 (delayed responses). Panels show data split
into upper and lower quantiles by adjective frequency. Line represents the estimated probability from a
univariate logistic regression model, and shading shows 95% confidence intervals. Each point represents
the mean for a unique critical bigram, colored by adjective frequency quantile.
encoding can only start once adjective encoding is complete, very low frequency adjectives can block ad-
vance planning of nouns. By contrast, more frequent adjectives will be finished planning earlier, allowing
more time for noun encoding prior to speech onset.
Experiment 2
Experiment 2 was identical to Experiment 1 except that a delay was enforced between presentation of the
phrase and the cue for participants to give their response. This was intended to give participants extra time to
retrieve and prepare phonological details before onset of speech. Accordingly, our hypothesis predicts that
flapping should be overall more likely in this condition. It also predicts that the effect of noun frequency
should be reduced, since the advantage conferred by faster noun retrieval should be less important when
speakers have plenty of time to retrieve the noun in advance of articulation.
An amendment to the pre-registration was made to detail changes to the participant inclusion criteria, data
exclusion criteria, and model specification. The amendment is available on the OSF page.
Sample size
The target sample size was 50 participants, based on the same power analysis used for Experiment 1. How-
ever, recruitment was interrupted due to safety measures imposed by the authors’ home university to mitigate
the Covid-19 pandemic. Therefore, only 42 eligible participants were able to be included in this study.
Participants were recruited through the Northwestern University Linguistics department subject pool (com-
pensated by course credit) or recruited from the Northwestern community using flyers (compensated by $7).
None had participated in Experiment 1, all started learning English before age 1 and reported no uncorrected
vision or hearing impairment. Ages ranged from 18 to 22 (M= 19.1), and gender self-identifications of
participants were 26 female, 1 non-binary, and 15 male. Participants all spoke varieties of English which
include flapping (the majority learned English in the United States, one in Korea, and one in Guyana). Partic-
ipants’ self-reported reading proficiency in English on a 10 point scale varied between 8 and 10 (M= 9.31),
and 34 reported knowing a language other than English.
The materials were identical to Experiment 1.
Design and Procedure
The design was identical to Experiment 1, with a small change in the procedure. Written and verbal instruc-
tions were given to participants asking them to read the phrases on the screen silently, then say the phrase
aloud as quickly as possible once the green circle appeared on the screen. Each trial began with a white fix-
ation dot presented in the center of the screen for 500 msec, followed by presentation of the stimulus phrase.
The phrase remained on the screen for a randomly selected interval of 1250 msec, 1500 msec, 1750 msec,
or 2000 msec. The phrase was then masked by six black Xs with a large green circle above them. This cue
for participants to initiate their response stayed on the screen for 600 msec, followed by a black screen for
900 msec before the beginning of the next trial. The experiment was run using the open-source software
OpenSesame (Mathôt et al., 2012), and the code used to run the experiment is available in the OSF page.
Data Exclusions
The total number of trials collected from qualifying participants was 10080. Trials in which the participant
said the wrong word, restarted, or pronounced the target phrase incorrectly were excluded (n= 80), as were
trials in which automatic detection of reaction time or voicing failed (typically due to participants speaking on
or before the response prompt, n= 69). From the subset with errors removed, participants’ mean response
times were calculated, and responses that were ±3 SDs away from the participants’ mean were discarded
(n= 126). As specified in the amendment to the pre-registration (registered prior to data collection for
Experiment 2), trials in which the participant paused between the adjective and the noun were discarded
(n= 786). In total, these exclusions resulted in a loss of 10.71% of the data, leaving 9000 observations for
Statistical model
Models for reaction time and flapping were fit with the same specifications as in Experiment 1, reported
in Tables 3 and 4. An additional cross-experiment analysis, which was not pre-registered, was conducted
Table 3: Fixed-effects estimates for linear regression of reaction times in Experiment 2 (delayed re-
sponses). P-values are estimated with Satterthwaite’s degrees of freedom method from the lmerTest pack-
age (Kuznetsova et al., 2017). Random effects are reported on the OSF page
βse(β) df t-value Pr(|t|)
(Intercept) 2.59400 0.00950 49.22 273.30 < 0.001
Adjective Frequency -0.00250 0.00310 43.78 -0.78 0.438
Noun Frequency -0.00270 0.00130 18.36 -2.20 0.043
Adjective Length 0.00910 0.00410 36.21 2.22 0.032
Block Number -0.00260 0.00043 8822.48 -6.00 < 0.001
Adjective*Noun Frequency 0.00075 0.00100 14.02 0.72 0.483
Adjective Frequency*Length -0.00620 0.00370 36.40 -1.70 0.106
Noun Frequency*Adjective Length 0.00180 0.00150 16.29 1.21 0.244
Adj Freq*Noun Freq*Adj Length 0.00140 0.00130 11.20 1.08 0.302
for flapping to test the reliability of effect size differences between the two experiments. The same model
specification was used as for previous flapping models, with the addition of a “condition” variable which
was set to 0 for speeded responses (Experiment 1) and to 1 for the delayed responses (Experiment 2). This
variable was allowed to interact with each of the other fixed effects. The full model is reported in Appendix
B. Given that the power analysis was not conducted with “condition” or any of its interactions in mind, the
results of this pooled Experiment 1/2 model should be taken as exploratory rather than confirmatory, and
should be confirmed through future replications.
Reaction time
Consistent with more advance planning, mean reaction times were faster in Experiment 2 (means for Ex-
periment 1: 621 msec; Experiment 2: 403 msec). Reaction times were faster for higher frequency nouns
β=0.0027, p = 0.043), but in contrast to Experiment 1 there was no significant effect of adjective
frequency ( ˆ
β=0.0025, p = 0.438).
There was a significant effect of adjective length in Experiment 2 ( ˆ
β= 0.0091, p = 0.032). Unlike Ex-
periment 1, there was no significant interaction between adjective length and frequency ( ˆ
β=0.0062, p =
0.106), nor were any other interactions significant.
As predicted, there was significantly more flap use in Experiment 2 (mean: 29.9%), the delayed response
condition, as compared to Experiment 1 (mean: 18.6%; cross-experiment model, delayed condition: ˆ
0.91, p = 0.001).
As in Experiment 1, higher frequency adjectives were more likely to be flapped ( ˆ
β= 0.25, p = 0.007).
Figure 2 (right panel) suggests that noun frequency was also associated with more flap use, but this effect
was not significant ( ˆ
β= 0.058, p = 0.247). Consistent with the qualitative difference in noun frequency
effects across experiments, the pooled model finds a significant interaction of condition and noun frequency
effects, illustrated in Figure 3 ( ˆ
β=0.11, p = 0.044) This suggests that noun frequency effects on flapping
are driven, in part, by advance planning.
The interaction between adjective and noun frequency was not significant, in contrast to Experiment
1 ( ˆ
β= 0.059, p = 0.16; see Figure 4). There was a significant three-way interaction between adjective
frequency, adjective length, and noun frequency (ˆ
β=0.14, p = 0.008), such that the adjective frequency
* noun frequency interaction observed in Experiment 1 was primarily found for monosyllabic adjectives.
No other interactions were significant.
The covariate effects were qualitatively similar to Experiment 1, with positive effects for block number
and speech rate (BlockN umber :ˆ
β= 0.1, p < 0.001, SpeechRate :ˆ
β= 0.74, p < 0.001), and a negative
effect for initial stress on the noun ( ˆ
β=0.76, p < 0.001).
As predicted, allowing more time for planning decreased reaction times and increased the use of flapping.
Critically, the effect of noun frequency was significantly attenuated with delayed responses. This suggests
that noun frequency effects on flapping are driven in part by an advance planning process that varies de-
pending on task demands.
General Discussion
This study investigated the within-speaker dynamics of word form encoding in multi-word utterances. We fo-
cused on a probablistic phonological pattern in which there is a dependency between two adjacent phonemes
belonging to different words, namely /t/-flapping in English. Since the use of a flap variant requires “look
Table 4: Fixed-effects estimates for logistic regression model of flap use in Experiment 2 (delayed re-
sponses). P-values are estimated with Satterthwaite’s degrees of freedom method from the lmerTest pack-
age (Kuznetsova et al., 2017). Random effects are reported on the OSF page
(Intercept) -1.200 0.230 -5.00 < 0.001
Adjective Frequency 0.250 0.092 2.71 0.007
Noun Frequency 0.058 0.050 1.16 0.247
Adjective Length 0.210 0.140 1.52 0.129
Block Number 0.100 0.012 8.47 < 0.001
Speech Rate 0.740 0.080 9.17 < 0.001
Noun Initial Stress -0.760 0.100 -7.40 < 0.001
Adjective*Noun Frequency 0.059 0.042 1.41 0.16
Adjective Frequency*Length -0.120 0.110 -1.10 0.268
Noun Frequency*Adjective Length -0.015 0.060 -0.26 0.796
Adj Freq*Noun Freq*Adj Length -0.140 0.052 -2.60 0.008
ahead” to check whether the next word begins with a vowel, the presence of a flap serves as an index ad-
vance planning. Our results show that in Adjective-Noun phrases, the probability of flap use, and therefore
the degree of advance planning, is based on word-specific utterance characteristics (lexical frequency) in
addition to current task demands. Flapping was more likely to occur when nouns were easier to retrieve (i.e.,
higher frequency). When a response delay was enforced, more advance planning occurred, diminishing the
disadvantage of low frequency nouns and increasing the overall likelihood of flap use.
These findings converge with previous work showing that advance planning can shift as a function of
task demands (Griffin, 2003; Klaus et al., 2017; Meyer et al., 2007; V. Wagner et al., 2010; Wynne et al.,
2018). Our results complement previous work showing that, under the same demands, different speakers
may show different degrees of advance planning (Michel Lange & Laganaro, 2014; Schriefers & Teruel,
1999). This study adds a new key insight to the general concept of flexible planning scope: within speakers,
the degree of advance planning varies continuously from moment to moment, partly as a function of the
accessibility of the form of upcoming words (as indexed by lexical frequency).
Our results also converge with work on phonetic variation in spontaneous speech, supporting the causal
link between advance planning and variation proposed in Kilbourn-Ceron (2017). Kilbourn-Ceron et al.
(2020) investigated flapping in spontaneous speech, and found that higher conditional probability of the
upcoming word given the target word (e.g., the probability of artist coming after great) led to increased
likelihood of flapping. They did not find any effect of second word frequency, as we did in this study.
However, the two measures are highly correlated in spontaneous speech, making it difficult to disentangle
their effects. Future work could investigate the effect of conditional probability experimentally, where these
two factors can be de-correlated. Our proposal predicts that there should indeed be an effect proportional
to the influence of conditional probability on the degree of advance planning. Some preliminary supporting
evidence has been found for liaison in French (M. Wagner et al., 2020)
The manipulations in this study targeted individual words (frequency, length) and global task demands
(response delays). It is likely that many other factors could affect the degree of advance planning at the
word form stage. Even within speech planning, the advance planning of word forms must be bounded by the
extent of advance planning at earlier stages, at least according to serial models of speech planning. Future
work should investigate whether delays in processing of semantic and grammatical aspects of the utterance
have downstream consequences for the extent of advance planning at the word form level.
This paper provides new evidence for the dynamic nature of advance planning during word form encoding.
Phonetic variation provides us with a new tool to investigate the scope of planning, moving beyond reaction
time to examine the ongoing nature of planning following the onset of speech.
Author Note
Authorship contributions (following CRediT:
OKC: Conceptualization, Data Curation, Formal analysis, Funding acquisition, Investigation, Methodology,
Project administration, Software, Supervision, Visualization, Writing – original draft, Writing – review &
MG: Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision,
Writing – review & editing.
We wish to acknowledge the contributions of Chandana Sooranhalli in running participants, and Chun
Liang Chan for technical assistance with software and equipment. We are also grateful to the participants
of the Phonatics discussion group at Northwestern and the audience of LabPhon17 for questions and com-
ments that improved this work. This research was supported by the Fonds de Recherche du Québec through
Postdoctoral fellowship 2019-B3Z-255232 awarded to OKC.
Open Practices
A detailed analysis plan including methods, sample size, and statistical model specification was pre-registered
prior data collection for Experiment 1. An amendment to the plan was pre-registered prior to data collection
for Experiment 2. These documents are available in this project’s Open Science Framework page, along
with code for the experimental procedure, code and results of the power analysis, code for the statistical
analysis, dependent measure data, and acoustic data (for those participants who consented to making their
data publicly available). The OSF page can be found at
Alario, F.-X., & Caramazza, A. (2002). The production of determiners: Evidence from French. Cognition,
82(3), 179–223.
Alario, F.-X., Costa, A., & Caramazza, A. (2002). Frequency effects in noun phrase production: Implications
for models of lexical access. Language and Cognitive Processes,17(3), 299–319.
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. (arXiv preprint
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2013). lme4: Linear mixed-effects models using Eigen
and S4 [Computer software manual]. Retrieved from
(R package version 1.1-21)
Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word
frequency norms and the introduction of a new and improved word frequency measure for American
English. Behavior Research Methods,41(4), 977–990.
Buchwald, A. (2014). Phonetic processing. In M. Goldrick, V. Ferreira, & M. Miozzo (Eds.), The Oxford
handbook of language production (pp. 245–258). Oxford, England: Oxford University Press.
Bürki, A., Frauenfelder, U. H., & Alario, F.-X. (2015). On the resolution of phonological constraints in
spoken production: Acoustic and response time evidence. The Journal of the Acoustical Society of
America,138(4), EL429–EL434. doi: 10.1121/1.4934179
Bürki, A., Laganaro, M., & Alario, F.-X. (2014). Phonologically driven variability: The case of determiners.
Journal of Experimental Psychology: Learning, Memory, and Cognition,40(5), 1348-1362.
Bürki, A., Sadat, J., Dubarry, A.-S., & Alario, F.-X. (2016). Sequential processing during noun phrase
production. Cognition,146, 90 - 99. Retrieved from
article/pii/S0010027715300585 doi:
Buz, E., & Jaeger, T. F. (2016). The (in) dependence of articulation and lexical planning during isolated
word production. Language, Cognition and Neuroscience,31(3), 404–424.
Bybee, J. (2002). Phonological evidence for exemplar storage of multiword sequences. Studies in Second
Language Acquisition, 215-221.
Damian, M. F., & Dumay, N. (2009). Exploring phonological encoding through repeated segments. Lan-
guage and Cognitive Processes,24(5), 685–712.
De Jong, K. (1998). Stress-related variation in the articulation of coda alveolar stops: Flapping revisited.
Journal of Phonetics,26(3), 283–310.
Eager, C. (2015). Automated voicing analysis in praat: Statistically equivalent to manual segmentation.
In T. S. C. for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences.
University of Glasgow: Glasgow.
Ferreira, F. (1993). Creation of prosody during sentence production. Psychological review,100(2), 233-253.
Ferreira, F., & Swets, B. (2002). How incremental is language production? evidence from the production of
utterances requiring the computation of arithmetic sums. Journal of Memory and Language,46, 57–84.
Fink, A., Oppenheim, G. M., & Goldrick, M. (2018). Interactions between lexical access and articulation.
Language, Cognition and Neuroscience,33(1), 12–24.
Goldrick, M. (2014). Phonological processing: The retrieval and encoding of word form information in
speech production. In M. Goldrick, V. Ferreira, & M. Miozzo (Eds.), The Oxford handbook of language
production (pp. 228–244). Oxford, England: Oxford University Press.
Goldrick, M., McClain, R., Cibelli, E., Adi, Y., Gustafson, E., Moers, C., & Keshet, J. (2019). The influence
of lexical selection disruptions on articulation. Journal of Experimental Psychology: Learning, Memory,
and Cognition,45(6), 1107.
Griffin, Z. M. (2003). A reversed word length effect in coordinating the preparation and articulation of
words in speaking. Psychonomic Bulletin & Review,10(3), 603–609.
Griffin, Z. M., & Bock, K. (1998). Constraint, word frequency, and the relationship between lexical pro-
cessing levels in spoken word production. Journal of Memory and Language,38, 313-338.
Jescheniak, J. D., & Levelt, W. J. (1994). Word frequency effects in speech production: Retrieval of syntactic
information and of phonological form. Journal of Experimental Psychology: Learning, Memory, and
Cognition,20(4), 824.
Kawamoto, A. H., Liu, Q., & Kello, C. T. (2015). The segment as the minimal planning unit in
speech production and reading aloud: evidence and implications. Frontiers in Psychology,6, 1457.
Retrieved from doi:
Kilbourn-Ceron, O. (2017). Speech production planning affects variation in external sandhi (Unpublished
doctoral dissertation). McGill University.
Kilbourn-Ceron, O., Clayards, M., & Wagner, M. (2020). Predictability modulates pronunciation variants
through speech planning effects: A case study on coronal stop realizations. Laboratory Phonology:
Journal of the Association for Laboratory Phonology,11(1), 5. doi: 10.5334/labphon.168
Kilbourn-Ceron, O., Wagner, M., & Clayards, M. (2016). The effect of production planning locality on
external sandhi: a study in /t/. In The proceedings of the 52nd Meeting of the Chicago Linguistics
Society. Retrieved from
Kittredge, A. K., Dell, G. S., Verkuilen, J., & Schwartz, M. F. (2008). Where is the effect of frequency
in word production? insights from aphasic picture-naming errors. Cognitive neuropsychology,25(4),
Klaus, J., Mädebach, A., Oppermann, F., & Jescheniak, J. D. (2017). Planning sentences while doing other
things at the same time: effects of concurrent verbal and visuospatial working memory load. Quarterly
Journal of Experimental Psychology,70(4), 811-831. Retrieved from
17470218.2016.1167926 (PMID: 26985697) doi: 10.1080/17470218.2016.1167926
Konopka, A. E. (2012). Planning ahead: How recent experience with structures and words changes the
scope of linguistic planning. Journal of Memory and Language,66(1), 143–162.
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed
effects models. Journal of Statistical Software,82(13), 1–26. doi: 10.18637/jss.v082.i13
Levelt, W. J., & Meyer, A. S. (2000). Word for word: Multiple lexical access in speech production. European
Journal of Cognitive Psychology,12(4), 433–452.
Liu, Q., Kawamoto, A. H., Payne, K. K., & Dorsey, G. N. (2018). Anticipatory coarticulation and the minimal
planning unit of speech. Journal of Experimental Psychology: Human Perception and Performance,
44(1), 139.
Mathôt, S., Schreij, D., & Theeuwes, J. (2012). OpenSesame: An open-source, graphical experiment
builder for the social sciences. Behavior Research Methods,44(2), 314–324. Retrieved from https:// doi: 10.3758/s13428-011-0168-7
McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., & Sonderegger, M. (2017). Montreal Forced Aligner:
Trainable text-speech alignment using Kaldi. In Proceedings of the 18th conference of the International
Speech Communication Association (pp. 498–502).
Meyer, A. S. (1996). Lexical access in phrase and sentence production: Results from picture–word interfer-
ence experiments. Journal of Memory and Language,35(4), 477–496.
Meyer, A. S., Belke, E., Häcker, C., & Mortensen, L. (2007). Use of word length information in utterance
planning. Journal of Memory and Language,57(2), 210–231. doi: 10.1016/j.jml.2006.10.005
Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Pickett, J. P., … others (2011). Quantitative
analysis of culture using millions of digitized books. Science,331(6014), 176–182.
Michel Lange, V., & Laganaro, M. (2014). Inter-subject variability modulates phonological advance planning
in the production of adjective-noun phrases. Frontiers in Psychology,5. doi: 10.3389/fpsyg.2014.00043
Miozzo, M., & Caramazza, A. (1999). The selection of determiners in noun phrase production. Journal of
Experimental Psychology: Learning, Memory, and Cognition,25(4), 907.
Oldfield, R. C., & Wingfield, A. (1964). The time it takes to name an object. Nature,202, 1031–1032.
Oppermann, F., Jescheniak, J. D., & Schriefers, H. (2010). Phonological advance planning in sen-
tence production. Journal of Memory and Language,63(4), 526 - 540. Retrieved from http://www doi:
O’Seaghdha, P. G., Chen, J.-Y., & Chen, T.-M. (2010). Proximate units in word production: Phonological
encoding begins with syllables in mandarin chinese but with segments in english. Cognition,115(2),
Pierrehumbert, J. B. (2001). Stochastic phonology. Glot international,5(6), 195-207.
R Core Team. (2013). R: A language and environment for statistical computing [Computer software manual].
Vienna, Austria. Retrieved from
Raymond, W. D., Pitt, M., Johnson, K., Hume, E., Makashay, M., Dautricourt, R., & Hilts, C. (2002).
An analysis of transcription consistency in spontaneous speech from the Buckeye corpus. In Seventh
International Conference on Spoken Language Processing.
Schnur, T. T. (2011). Phonological planning during sentence production: Beyond the verb. Frontiers in
Psychology,2. Retrieved from doi: 10.3389/
Schnur, T. T., Costa, A., & Caramazza, A. (2006, Feb). Planning at the phonological level during sentence
production. Journal of Psycholinguistic Research,35(2), 189–213. doi: 10.1007/s10936-005-9011-6
Schriefers, H. (1999). Phonological facilitation in the production of two-word utterances. European Journal
of Cognitive Psychology,11(1), 17–50.
Schriefers, H., & Teruel, E. (1999). The production of noun phrases: A cross linguistic comparison of
French and German. In Proceedings of the 21st annual conference of the Cognitive Science Society
(p. 637-642). Mahwah, NJ: Erlbaum.
Spalek, K., Bock, K., & Schriefers, H. (2010). A purple giraffe is faster than a purple elephant: Inconsistent
phonology affects determiner selection in English. Cognition,114(1), 123 - 128. doi:
Sternberg, S., Monsell, S., Knoll, R. L., & Wright, C. E. (1978). The latency and duration of rapid movement
sequences: Comparisons of speech and typewriting. In G. Stelmach (Ed.), Information processing in
motor control and learning (pp. 117–152). New York: Academic Press.
Tamminga, M., MacKenzie, L., & Embick, D. (2016). The dynamics of variation in individuals. Linguistic
Variation,16(2), 300–336.
Tanner, J., Sonderegger, M., & Wagner, M. (2017). Production planning and coronal stop deletion in
spontaneous speech. Laboratory Phonology,8(1).
Van Heuven, W. J., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved
word frequency database for British English. Quarterly Journal of Experimental Psychology,67, 1176-
Wagner, M. (2012). Locality in phonology and production planning. In J. Loughran & A. McKillen (Eds.),
Proceedings of phonology in the 21 century: Papers in honour of Glyne Piggott (Vol. 22, pp. 1–18).
Montreal, QC.
Wagner, M., Lachapelle, J., & Kilbourn-Ceron, O. (2020). Liaison and the locality of production planning.
Poster presented at the 17th Conference on Laboratory Phonology, July 6–8. [Online].
Wagner, V., Jescheniak, J. D., & Schriefers, H. (2010). On the flexibility of grammatical advance planning
during sentence production: effects of cognitive load on multiple lexical access. Journal of Experimental
Psychology: Learning, Memory, and Cognition,36(2), 423.
Whalen, D. (1990). Coarticulation is largely planned. Journal of Phonetics,18, 3–35.
Wheeldon, L. R., & Lahiri, A. (1997). Prosodic units in speech production. Journal of Memory and
Language,37, 356-381.
Wheeldon, L. R., & Lahiri, A. (2002). The minimal unit of phonological encoding: Prosodic or lexical
word. Cognition,85(2), 31–41.
Wynne, H. S., Wheeldon, L., & Lahiri, A. (2018). Compounds, phrases and clitics in connected speech.
Journal of Memory and Language,98, 45–58.
A Experimental items
Table 5: List of materials used in Experiments 1 and 2. Each row represents one item, showing the adjective
and three unique nouns that were paired with it. For full details including frequency measures, visit this
project’s OSF page.
Noun Frequency
Adjective High Medium Low (Consonant)
animate ocean oatmeal acorn soil
erudite office invoice armchairs map
starlit opera autumn expanse ridge
tacit arrest exile accords retreat
russet island apron okra cellar
ornate army outpost archway ransom
literate owner ethic outcry rumor
inert agent error inbox lemur
fraught estate illness upkeep mask
taut order eyebrow airstrike mummy
trite issue export anthems moral
petite event oven icons shorts
upbeat attempt outlaw envoys shepherd
moot effect absence allure roads
sedate adult insect oboe knight
distraught affair outing aardvark lighter
mute uncle outcast embers niece
adequate island eggnog oxfords lipstick
curt exchange athlete emcee ranger
scarlet outfit icing anvil robe
elaborate alarm archive acclaim method
concrete address abyss alcove ceiling
delicate apple onion aloe sausage
slight effort impact algae menu
corporate ally orchid android relative
neat eagle almond adverb rider
polite action ogre orca maniac
remote offense orbit armoire shield
favorite artist oyster easel saddle
opposite interest ointment eyedrop yacht
wet airplane ostrich urchins stream
bright evening opal orchards necklace
complete exit altar antler madness
private angle eclipse igloo web
fat actor eggplant earthworm shrimp
cute airport iceberg aussie monkey
quiet award exhaust alehouse raid
sweet alley amber anklet rash
late image intake ingot symbol
great exam atlas airwave spoon
Table 6: Fixed-effects estimates for logistic mixed model of pooled data from Experiments 1 and 2. P-values
are estimated with Satterthwaite’s degrees of freedom method from the lmerTest package (Kuznetsova et
al., 2017). Random effects are reported on the OSF page
(Intercept) -2.1000 0.210 -9.700 < 0.001
Condition (Delayed) 0.9100 0.290 3.190 0.001
Adjective Frequency 0.2500 0.086 2.860 0.004
Noun Frequency 0.1700 0.054 3.090 0.002
Adjective Length 0.2000 0.120 1.660 0.098
Block Number 0.1800 0.014 13.440 < 0.001
Speech Rate 0.7600 0.078 9.690 < 0.001
Noun Initial Stress -0.8000 0.095 -8.400 < 0.001
Condition*Adjective Frequency -0.0010 0.050 -0.020 0.984
Condition*Noun Frequency -0.1100 0.054 -2.000 0.044
Adjective*Noun Frequency 0.0900 0.046 1.970 0.049
Condition*Adjective Length -0.0041 0.064 -0.064 0.949
Adjective Frequency*Length -0.1500 0.110 -1.500 0.144
Noun Frequency*Adjective Length 0.0018 0.066 0.027 0.978
Condition*Speech Rate -0.0190 0.096 -0.200 0.84
Condition*Block Number -0.0860 0.018 -4.700 < 0.001
Condition*Noun Initial Stress 0.0300 0.100 0.290 0.775
Condition*Adjective Freq*Noun Freq -0.0220 0.042 -0.530 0.597
Condition*Adjective Freq* Adjective Length 0.0320 0.054 0.590 0.552
Condition*Noun Freq* Adjective Length -0.0120 0.062 -0.190 0.85
Adj Freq*Noun Freq*Adj Length -0.0310 0.057 -0.530 0.594
Condition*Adj Freq*Noun Freq*Adj Length -0.1000 0.051 -2.000 0.049
B Logistic model comparing experiments
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Predictability has been shown to be associated with many dimensions of variation in speech, including durational variation and variable omission of segments. However, the mechanism or mechanisms that underlie these effects are still unclear. This paper presents data on a new aspect of predictability in speech, namely how it affects allophonic variation. We examine two coronal stop allophones in English, flap and glottal stop, and find that their relationship with predictability is quite different from what is expected under current theories of probabilistic reduction in speech. Flapping is more likely when the word that follows is more predictable, but is not influenced by the frequency of the word itself, while glottal stops are more likely in words that are less predictable. We propose that the crucial distinction between these two allophones is how they are conditioned by phonological context. This, we argue, interacts with online speech planning processes and gives rise to variability for context-dependent allophones. This hypothesis offers a specific, testable mechanism for certain predictability effects, and has the potential to extend to other factors that contribute to variability in speech.
Full-text available
Interactive models of language production predict that it should be possible to observe long-distance interactions; effects that arise at one level of processing influence multiple subsequent stages of representation and processing. We examine the hypothesis that disruptions arising in nonform-based levels of planning—specifically, lexical selection—should modulate articulatory processing. A novel automatic phonetic analysis method was used to examine productions in a paradigm yielding both general disruptions to formulation processes and, more specifically, overt errors during lexical selection. This analysis method allowed us to examine articulatory disruptions at multiple levels of analysis, from whole words to individual segments. Baseline performance by young adults was contrasted with young speakers’ performance under time pressure (which previous work has argued increases interaction between planning and articulation) and performance by older adults (who may have difficulties inhibiting nontarget representations, leading to heightened interactive effects). The results revealed the presence of interactive effects. Our new analysis techniques revealed these effects were strongest in initial portions of responses, suggesting that speech is initiated as soon as the first segment has been planned. Interactive effects did not increase under response pressure, suggesting interaction between planning and articulation is relatively fixed. Unexpectedly, lexical selection disruptions appeared to yield some degree of facilitation in articulatory processing (possibly reflecting semantic facilitation of target retrieval) and older adults showed weaker, not stronger interactive effects (possibly reflecting weakened connections between lexical and form-level representations).
Full-text available
This study investigates the interaction of lexical access and articulation in spoken word production, examining two dimensions along which theories vary. First, does articulatory variation reflect a fixed plan, or do lexical access-articulatory interactions continue after response initiation? Second, to what extent are interactive mechanisms hard-wired properties of the production system, as opposed to flexible? In two picture naming experiments, we used semantic neighbour manipulations to induce lexical and conceptual co-activation. Our results provide evidence for multiple sources of interaction, both before and after response initiation. While interactive effects can vary across participants, we do not find strong evidence of variation of effects within individuals, suggesting that these interactions are relatively fixed features of each individual’s production system.
Full-text available
Many phonological processes can be affected by segmental context spanning word boundaries, which often lead to variable outcomes. This paper tests the idea that some of this variability can be explained by reference to production planning. We examine coronal stop deletion (CSD), a variable process conditioned by preceding and upcoming phonological context, in a corpus of spontaneous British English speech, as a means of investigating a number of variables associated with planning: Prosodic boundary strength, word frequency, conditional probability of the following word, and speech rate. From the perspective of production planning, (1) prosodic boundaries should affect deletion rate independently of following context; (2) given the locality of production planning, the effect of the following context should decrease at stronger prosodic boundaries; and (3) other factors affecting planning scope should modulate the effect of upcoming phonological material above and beyond the modulating effect of prosodic boundaries. We build a statistical model of CSD realization, using pause length as a quantitative proxy for boundary strength, and find support for these predictions. These findings are compatible with the hypothesis that the locality of production planning constrains variability in speech production, and have practical implications for work on CSD and other variable processes.
The timing and coordination of articulatory movements is essential to producing speech. A considerable body of literature explores the cognitive and motor control mechanisms involved. This review provides an accessible introduction to that literature, while highlighting articulatory phenomena that challenge prevailing views. We begin with the concept of dynamic articulatory gestures, and their role in the coproduction account of coarticulation. We then outline prominent accounts of gestural coordination. These accounts generally assume that speech planning involves the selection of coarsely grained units, which then unfold involuntarily. Later we consider evidence that speakers rapidly adapt to surprising acoustic feedback, that they plan around the expected timing of upcoming acoustic consequences, and that they reshape the articulatory time course on the fly. These findings suggest that movements can be flexibly modified, and that execution can considerably overlap planning. We advocate for speech models that permit speakers fine‐grained control over unfolding articulation.
Four language production experiments examine how English speakers plan compound words during phonological encoding. The experiments tested production latencies in both delayed and online tasks for English noun-noun compounds (e.g., daytime), adjective-noun phrases (e.g., dark time), and monomorphemic words (e.g., denim). In delayed production, speech onset latencies reflect the total number of prosodic units in the target sentence. In online production, speech latencies reflect the size of the first prosodic unit. Compounds are metrically similar to adjective-noun phrases as they contain two lexical and two prosodic words. However, in Experiments 1 and 2, native English speakers treated the compounds as single prosodic units, indistinguishable from simple words, with RT data statistically different than that of the adjective-noun phrases. Experiments 3 and 4 demonstrate that compounds are also treated as single prosodic units in utterances containing clitics (e.g., dishcloths are clean) as they incorporate the verb into a single phonological word (i.e. dishcloths-are). Taken together, these results suggest that English compounds are planned as single recursive prosodic units. Our data require an adaptation of the classic model of phonological encoding to incorporate a distinction between lexical and postlexical prosodic processes, such that lexical boundaries have consequences for post-lexical phonological encoding.
One of the most persistent arguments against the segment as the minimal planning unit is that the seemingly ubiquitous, thus, presumed obligatory, nature of anticipatory coarticulation (AC) effects favors the syllable or a larger unit. By contrast, we present the results of 3 experiments showing that AC is not ubiquitous, but graded and variable based on (a) phonological availability and (b) the specific criterion to initiate articulation adopted by a speaker. We further argue that phonological encoding is parallel. These results point to (a) the segment, and not the syllable, as the minimal planning unit and (b) a flexible planning scope. Implications with respect to the current formulation of AC regarding phonological availability and the minimal unit of speech articulation are discussed. (PsycINFO Database Record