Content uploaded by Sophia Wulfert
Author content
All content in this area was uploaded by Sophia Wulfert on Jun 28, 2022
Content may be subject to copyright.
PRODUCTION OF INITIAL CONSONANT CLUSTERS
1
1
2
3
Speech Errors in the Production of Initial Consonant
4
Clusters:
5
The Roles of Frequency and Sonority
6
7
8
Sophia Wulfert1, Peter Auer1 and Adriana Hanulíková1,2
9
10
1Department of German Studies, University of Freiburg, Germany
11
12
2Freiburg Institute for Advanced Studies, University of Freiburg, Germany
13
14
15
Author note
16
17
18
Sophia Wulfert 0000-0002-9043-5467
19
Peter Auer
20
Adriana Hanulíková 0000-0001-9010-4185
21
22
We have no known conflict of interest to disclose.
23
Correspondence concerning this article should be addressed to:
24
Sophia Wulfert
25
Department of German Studies
26
Albert-Ludwigs-Universität Freiburg
27
79085 Freiburg
28
Germany
29
Email: sophia.wulfert@frequenz.uni-freiburg.de
30
31
PRODUCTION OF INITIAL CONSONANT CLUSTERS
2
Abstract
32
Purpose: One of the central questions in speech production research is to what degree certain
33
structures have an inherent difficulty and to what degree repeated encounter and practice makes
34
them easier to process. The goal of this paper was to determine the extent to which frequency and
35
sonority distance of consonant clusters predict production difficulties.
36
Method: We used a tongue twister paradigm to elicit speech errors on syllable-initial German
37
consonant clusters and investigated the relative influences of cluster frequency and sonority
38
distance between the consonants of a cluster on production accuracy. Native speakers of German
39
produced pairs of monosyllabic pseudowords beginning with consonant clusters at a high speech
40
rate.
41
Results: Error rates decreased with increasing frequency of the consonant clusters. A high
42
sonority distance, on the other hand, did not facilitate a cluster’s production. In addition, the
43
combination of consonant clusters in a stimulus pair has a great impact on production accuracy.
44
Conclusions: These results suggest that both frequency of use and sonority distance codetermine
45
production ease, as well as syntagmatic competition between adjacent sound sequences.
46
Keywords: consonant cluster, phonotactics, speech errors, frequency, sonority, speech
47
production
48
49
PRODUCTION OF INITIAL CONSONANT CLUSTERS
3
1. Introduction
50
Speech production is a highly efficient and automatized process. In everyday
51
communication, speakers produce an average of four to six syllables per second (Reetz &
52
Jongman, 2009), and speech errors, such as the addition, deletion or substitution of a phone, are
53
relatively rare.
1
However, their occurrence has been shown to be systematic (Fromkin, 1971;
54
Noteboom, 1973). Some linguistic items seem harder to process for the speech production system
55
than others. This raises the question of which factors make a given linguistic item hard or easy to
56
produce. For the lexical level, there is ample evidence that high frequency of use facilitates
57
processing and leads to automatization. Relative to low-frequency (LF) words, high-frequency
58
(HF) lexemes have been shown to be produced more accurately (Andrews, 1992; Jescheniak &
59
Levelt, 1994), faster (Andrews, 1992; Jescheniak & Levelt, 1994; Oldfield & Wingfield, 1965)
60
and with shorter durations (Bell et al., 2009). On a sublexical level, a similar effect has been
61
observed for HF phonemes and syllables: they are often produced with shorter latencies than their
62
LF counterparts (Levelt & Wheeldon, 1994; Mooshammer et al., 2015). Under adverse
63
conditions, as in aphasic speech or in tongue twisters, they are more error-resistant than LF items
64
(Aichert & Ziegler, 2004; Goldrick, 2002, 2004; Levitt & Healy, 1985). This has been taken as an
65
indication of better accessibility for HF items, both on the level of lexical planning and
66
phonological as well as phonetic planning. It can be speculated that the enhanced processing of
67
HF items is a general mechanism in speech production and is applicable to all linguistic units
68
relevant in processing. According to usage-based linguistics, this is due to entrenchment through
69
repeated use. For example, Bybee (1999, p. 232) describes the atoms of lexemes as “a set of
70
highly entrenched gestures and gestural configurations that are used and reused”. The more often
71
1
Older sources (Garnham et al., 1982; Leuninger, 1993) estimate less than five errors per 1000 spoken words, while
current investigations, using more reliable techniques, (Alderete & Davies, 2019) show that this is probably a considerable
underestimation of naturally occurring speech errors.
PRODUCTION OF INITIAL CONSONANT CLUSTERS
4
a particular gesture is used, the more entrenched it becomes and the more easily it can be
72
executed. A more general explanation, not restricted to speech gestures, comes from
73
connectionist models, in which HF units are more strongly activated. For these reasons,
74
frequency of use is a factor included in many models of speech production (e.g., Dell, 1986; Dell
75
et al., 1993; Gaskell & Marslen-Wilson, 1997; Levelt et al., 1999; Roelofs, 1997; Wade et al.,
76
2010).
77
On the other hand, evidence from clinical studies (e.g., Miozzo & Buchwald, 2013;
78
Stenneken et al., 2005) suggests that sonority principles also influence ease of production of
79
linguistic items. Sonority theory has mostly been used to account for cross-linguistic regularities
80
and preferences in syllable structure (e.g., Baertsch, 2012; Carnie, 1994; Cser, 2012; Sievers,
81
1897). Its basic premise is that the phones of a syllable should be arranged in keeping with their
82
sonority values. According to the Sonority Sequencing Principle (SSP; Selkirk, 1981; Sievers,
83
1897), the sonority of segments in a syllable should increase from the onset to the nucleus and
84
decrease from there on. A more fine-grained criterion of well-formedness is applied in the
85
Sonority Dispersion Principle (SDP; Clements, 1990), which states that the sonority rise in the
86
onset should be maximised and the sonority decline after the syllable peak minimised. Syllables
87
are said to be more complex and less well-formed as a function of their deviance from these
88
principles. Hence, sonority sequencing is a markedness phenomenon of the kind that also
89
Optimality Theory was inspired by and builds on (see also Smith & Moreton, 2012): syllables
90
can be more or less well-formed, but sonority constraints are violable (in that violations exist in
91
many languages). Within psycholinguistics, effects of sonority sequencing have mainly been
92
reported in perception, showing better perceptibility of SSP-obeying sequences (e.g., Berent et
93
al., 2007; Berent et al., 2014). Sonority effects on production have almost exclusively been
94
observed in speakers with acquired speech impairments. Studies with patients with impairments
95
PRODUCTION OF INITIAL CONSONANT CLUSTERS
5
on the phonological and phonetic levels and potentially articulatory programming indicate that
96
syllables that are more complex in terms of sonority principles are harder to process and therefore
97
produced less accurately (Miozzo & Buchwald, 2013; Romani & Calabrese, 1998) and
98
mispronunciations tend to result in productions with optimised syllable structure, turning for
99
example sport into port (Miozzo & Buchwald, 2013; see also Béland et al., 1990; Kohn et al.,
100
1998; Romani & Calabrese, 1998). Syllable structure in neologisms also seems to be constrained
101
by sonority principles (Stenneken et al., 2005). Although some studies do not find clear sonority
102
effects on production accuracy (e.g., Kohn et al., 1998) or on error patterns (Miozzo &
103
Buchwald, 2013), most studies indicate that sonority principles are relevant to cross-linguistic
104
preferences for certain syllable types and directly affect syllable production. Romani et al. (2013)
105
observed an unusual pattern: patients (phonological and apraxic) did not show a tendency for
106
more errors on words with difficult sonority profiles, but when they did produce errors, the
107
sonority profile was improved much more often than not. A number of studies report that the
108
sonority profiles of most productions remain unchanged compared to target words, but the few
109
changes that do occur are in the direction of sonority optimisation (Christman, 1994; Kohn et al.,
110
1998). Although it is clear that sonority is not the only factor affecting impaired speech
111
production, its influence is rarely compared to that of other factors directly. There are a few
112
notable exceptions, however. Most importantly, Romani et al. (2013) observed strong effects of
113
segment frequency, sonority, and markedness in apraxic speech errors and found that, when
114
disentangled from frequency, sonority shows the strongest effects. That is, far more errors
115
improved the sonority profile of a target word while segment frequency decreased than the other
116
way around. The similarly prominent role of sonority in relation to language-specific sequencing
117
preferences became apparent in a large corpus of English and German non-lexical aphasic speech
118
automatisms (Code & Ball, 1994). Here, there were no cases in which language-specific
119
PRODUCTION OF INITIAL CONSONANT CLUSTERS
6
phonotactics supersede the SSP. This means that, although aphasic speech generally follows
120
language-specific phonotactics, phonotactic sequences which violate sonority are avoided. In
121
contrast, Stenneken et al. (2005) note that, in spite of the high overall compliance with sonority
122
principles in their patient’s (a German Wernicke aphasic) production data, his deviations from the
123
general pattern may be due to a relatively high number of syllable-initial /sp/ and /st/ productions.
124
This deviation is interesting because it reflects the frequency of these German exceptions to
125
sonority principles and thus the potential interaction of sonority and frequency principles driving
126
the patient’s productions.
2
127
Moreover, the presence of a sonority sequencing effect seems to depend to some degree on
128
the specific impairment of the patient. Differences in sonority sensitivity between groups of
129
patients with impairments on different levels have been recruited as a testament to the level in
130
speech production on which sonority exerts an influence. Based on the locus of the patients’
131
impairments as displaying the strongest sonority effects in their studies and/or diverging
132
behaviour in different tasks, researchers have argued for the phonological lexicon (Kohn et al.,
133
1998; Romani et al., 2011), phonological encoding (Bastiaanse et al., 1994; Stenneken et al.,
134
2005), and articulatory planning (Romani & Calabrese, 1998) as the level at which the effects of
135
sonority sequencing arise.
136
Further support for psycholinguistic relevance of sonority principles comes from language
137
acquisition research. Both in first-language (L1) acquisition and in second-language (L2)
138
acquisition, speakers commonly reduce complex syllables in a way that optimises their sonority
139
profile (Barlow, 2005; Broselow & Finer, 1991; Hanulíková & Dietrich, 2008; Ohala, 1999;
140
2
It is startling, though, that /sp/ and /st/, rather than /ʃp/ and /ʃt/, are noted by Stenneken et al. (2005) to be over-
represented. It can be speculated whether this shows a bias for a universally unmarked phoneme while keeping the natural
class of the L1-frequent marked phoneme intact.
PRODUCTION OF INITIAL CONSONANT CLUSTERS
7
Yavaş, 2003). Moreover, new words seem to be acquired more easily when they conform to
141
sonority sequencing (Ulbrich et al., 2016). In contrast, sonority effects in unimpaired speech
142
production are rarely reported.
143
This paper investigates which factors facilitate the production of syllable-initial consonant
144
clusters in unimpaired speakers. As described above, there are two different approaches that
145
could explain differences in processing difficulty between consonant clusters—one based on a
146
universal, natural bias for more well-formed ones and the other based on familiarity due to
147
language-specific distributions. This latter, usage-based (e.g., Bybee, 1999; 2010), explanation
148
entails that consonant clusters are entrenched as units—the more frequently they are used, the
149
more entrenched they are. This notion of entrenchment is supported by a speeded naming study
150
(Kawamoto & Kello, 1999), in which onset clusters had shorter latencies and partially lower error
151
rates than simple onsets, thus suggestion strong cohesiveness and holistic processing. The
152
universalistic phonological account (e.g., Berent, 2013) assumes that a phonological principle
153
known to govern phonotactic distributions cross-linguistically— sonority sequencing—has a
154
direct influence on speech production. Both accounts receive empirical support to some degree.
155
In this paper, we take a usage-based stance but contrast it with the phonological perspective.
156
We use the term phonological in the sense of a shallow phonology, denoting the speech sounds
157
that are distinguished on a surface level.
158
It has been shown that HF clusters and HF bigrams in general tend to be produced with
159
shorter durations than LF clusters/bigrams (Edwards et al., 2004; Munson, 2001; Pouplier, Marin,
160
et al., 2017) but often fail to show significant frequency effects on reaction times (Andrews,
161
1992; Bose et al., 2007). In Russian, HF clusters are also produced with greater overlap as
162
speech rate increases, which is not the case for LF clusters (Pouplier, Marin, Hoole, et al., 2017).
163
PRODUCTION OF INITIAL CONSONANT CLUSTERS
8
Accuracy differences between LF and HF bigrams have been observed both for delayed word
164
naming (Andrews, 1992) and pseudoword repetition (Edwards et al., 2004, the effect being more
165
pronounced in children than in adults), although Munson (2001) finds the accuracy effect in
166
pseudoword repetition to be significant only for children but not for adults. It should also be
167
noted that in the word naming task (Andrews, 1992), effects of bigram frequencies were less
168
prominent than those of word frequencies. Importantly, however, several studies found a
169
tendency for HF phonemes to replace LF phonemes in speech errors (Goldrick, 2002, 2004;
170
Levitt & Healy, 1985). Specifically, one of the first studies inducing speech errors (Motley &
171
Baars, 1975) revealed that more errors were made in which the intruding phoneme is more
172
probable (in terms of transitional phoneme probability as well as positional probability) than
173
errors in which it is less probable than the competing phoneme. The production of HF items thus
174
seems to constitute a kind of default action: in the event of processing difficulties, productions
175
result in the most frequent structures, potentially creating speech errors.
176
At the same time, the direction of the substitution in errors and the general error tendency
177
do not show reliable patterns across studies. In Shattuck-Hufnagel and Klatt’s (1979) seminal
178
study on the phoneme frequency effect, there was a correlation between the frequency of a
179
consonant and its participation in speech errors, but the consonants’ involvement as targets and
180
intruders was symmetrical. Moreover, Santiago et al. (2007) observed a tendency of LF
181
phonemes to replace HF phonemes in speech errors and Stemberger (1991) found a frequency
182
effect in non-contextual errors and an “anti-frequency effect” in contextual speech errors. That
183
means that in the case of direct competition between an HF and an LF phoneme, the latter could
184
easily win over the former and replace it. Stemberger ascribes this to HF phonemes’ featural
185
underspecification and a general addition bias in speech errors, that is, a tendency for features and
186
phonemes to be added rather than deleted. However, he notes that this anti-frequency effect is not
187
PRODUCTION OF INITIAL CONSONANT CLUSTERS
9
found with nonce words. It is therefore presently unclear exactly under what circumstances
188
frequency effects occur. Less complex production tasks, like reading, simple naming, and
189
repetition, seem to produce null results with regard to production accuracy more often than
190
cognitively more demanding tasks (e.g., Brendel et al., 2008; Croot & Rastle, 2004; Laganaro &
191
Alario, 2006).
192
Another factor that can influence production is phonological neighbourhoods. High
193
sublexical frequency (e.g., phonotactic probability, biphone/bigram frequency) of a word or
194
nonce word is usually correlated with a high neighbourhood density and a high neighbourhood
195
frequency (Vitevitch & Luce, 1998; Arduino & Burani, 2004). The role of phonological
196
neighbourhoods in speech production is all but conclusive, however, with some studies finding
197
facilitative neighbourhood effects (Andrews, 1992; Harley & Bown, 1998; Vitevitch, 2002, 2003)
198
and some inhibitory effects (Sadat et al., 2014; Vitevitch & Stamer, 2006). Sadat et al. (2014)
199
reanalysed a number of previous studies, using finer statistical methods, and found converging
200
evidence for an inhibitory effect on response latencies. They note that this only applies to
201
unimpaired production, though, while facilitative effects can emerge in cases of disrupted speech
202
production. Due to sublexical frequencies’ correlation with neighbourhood density and
203
frequency, an inhibitory effect of neighbourhoods could obscure a sublexical frequency effect,
204
while a facilitating effect could inflate it.
205
2. The present study
206
Previous studies have produced inconclusive results with regard to whether HF units are
207
produced more accurately than LF units. Sonority effects, on the other hand, have almost
208
exclusively been observed in impaired populations and in language acquisition, not in unimpaired
209
processing. This raises three questions: 1) Is the production of initial consonant clusters
210
PRODUCTION OF INITIAL CONSONANT CLUSTERS
10
facilitated by high frequency of use? 2) Is sonority sequencing influential in unimpaired
211
production in a cognitively demanding task? 3) What is the relative weight of the two factors in
212
unimpaired speech production under high processing load?
213
Although the need for investigations directly comparing the influences of sublexical
214
frequencies and sonority sequencing has been noted (Miozzo & Buchwald, 2013), to our
215
knowledge, so far only one study (Romani et al., 2013) has done so, but again the population
216
studied was individuals with aphasia.
217
The aim of the present study was therefore to determine to what degree unimpaired pro-
218
cessing of consonant clusters for production is influenced by universal preferences based on
219
sonority sequencing and to what degree practice and experience with a specific phonotactic
220
system, as measured by frequency, override these prior biases and determine ease of production.
221
The study tests whether production accuracy increases for clusters with a high frequency of use
222
and/or increasing sonority distance between the consonants, i.e., more well-formed clusters.
223
The focus is on German, a language that permits a relatively large number of initial
224
consonant clusters of various sonority distances, including sibilant–stop clusters, which violate
225
the SSP. The latter are spread over a wide frequency range so that the effects of frequency and
226
sonority are easily disentangled. The /s/-initial clusters /st/, /sp/, and /sk/ have a low frequency,
227
while /ʃt/ and /ʃp/ are among the most common initial clusters in German, which make them an
228
interesting object of study.
229
We hypothesised that in situations of direct competition between two consonant clusters, the
230
HF cluster would win more often than the LF cluster. Therefore, LF clusters should have higher
231
error rates than HF clusters. At the same time, speech errors should result in the production of a
232
PRODUCTION OF INITIAL CONSONANT CLUSTERS
11
HF cluster more often than in the production of a LF cluster. However, if findings from studies
233
on apraxic speech production can be generalised to unimpaired speech production, then clusters
234
exhibiting a steep rise in sonority should have a processing advantage and hence show lower
235
error rates than clusters exhibiting a more gradual rise or a sonority reversal. Speech errors
236
should then improve the sonority profile of a cluster, that is, maximise the initial sonority rise.
237
We expected the influence of frequency to be strong enough to obliterate prior biases. For
238
clusters for which frequency-based and sonority-based predictions concerning production
239
accuracy diverge, we therefore expected the one based on frequency to be more accurate. For
240
example, initial /ʃt/ violates the SSP, so sonority theory would predict it to be error-prone. On the
241
other hand, it is very frequent in German, which makes it error-resistant according to usage-based
242
theory. We expect the influence of language-specific frequency to be stronger than that of
243
universal sonority sequencing and therefore predict /ʃt/ to be relatively error-resistant. For LF
244
clusters, the influence of sonority sequencing on production accuracy could then potentially be
245
stronger than for HF clusters.
246
We tested these hypotheses in a tongue twister paradigm: German speakers repeated pairs of
247
monosyllabic pseudowords that began with similar German consonant clusters at a fast rate.
248
Stimuli were paired so as to cause competition between the two similar syllable-initial consonant
249
clusters during the planning stage. In combination with the fast speech rate this induced speech
250
errors. We used phonemic speech errors as a measure of production accuracy, a relatively coarse
251
qualitative measure of production difficulty, which contrasts with many studies employing the
252
quantitative measure of production latencies. Speech errors can be seen as an indication of
253
competing speech plans (Baars, 1980) and have a long history of application for revealing the
254
inner workings of the speech production system.
255
PRODUCTION OF INITIAL CONSONANT CLUSTERS
12
3. Methods
256
3.1 Participants
257
Forty-one young adults (26 female, 15 male; mean age: 22.92; SD = 3.48) participated in
258
the study and received monetary compensation for their participation. All of them were native
259
speakers of German and reported speaking no German dialect. Participants who grew up in the
260
south of Germany were excluded from the study because experience with one of the test clusters,
261
/ks/, might deviate substantially from that of participants from other regions. Six of the subjects
262
reported having undergone speech therapy in the past and one subject reported a speech
263
development disorder that was not treated but disappeared over time. Six of these seven subjects
264
had error rates above average, especially for the most difficult onset clusters (i.e., those with high
265
error rates). Their data were included in the data set as some subjects with no history of
266
developmental issues showed similar error rates and patterns. All subjects gave written informed
267
consent before participation and were free to terminate their participation at any point. (Ethics
268
approval was not required for this study because there is no obligation for ethics approval of
269
psychological studies in Germany.)
270
271
3.2 Materials
272
The set of test clusters consisted of 16 legal initial German consonant clusters of varying
273
frequency and sonority distance (see Figure 1). Frequency values were based on CELEX type
274
frequencies (including spoken and written corpora) and extracted by querying WebCELEX
275
(http://celex.mpi.nl/) syllable onsets. Type rather than token frequencies were used because they
276
are considered to be the more relevant measure when it comes to phonotactic effects (cf. Munson,
277
2000, for production; Hay, Pierrehumbert & Beckman, 2004, for perception; Richtsmeier, 2011,
278
for rating).
279
PRODUCTION OF INITIAL CONSONANT CLUSTERS
13
The consonant clusters were arranged into 10 pairs (see Table 1) of similar clusters. There
280
were two conditions
3
for cluster pairing: Minimal pairs differ in one feature in one of the
281
consonants, (e.g., /tr/ and /kr/ differing in place of articulation in C1), while so-called metathesis
4
282
pairs are composed of the same two consonants in reversed order (e.g., /sk/ and /ks/). These
283
pairings were created to increase competition between consonant clusters. Previous research has
284
shown that similarity of segments and repeated segments in a sequence increase the probability of
285
speech errors (e.g., Cohen-Goldberg, 2012; O’Seaghdha & Marin, 2000; Pouplier & Goldstein,
286
2010; Shattuck-Hufnagel & Klatt, 1979). Four consonant clusters (/ʃt/, /ks/, /ps/, /fl/) appeared in
287
two pairs. For each cluster pair, eight pairs of monosyllabic pseudowords with CCVV, CCVː, or
288
CCVC structure were created as stimuli, totalling in 80 stimulus pairs. Pseudowords were chosen
289
in order to minimise lexical effects and direct attention to the sublexical level. Moreover,
290
pseudowords have been shown to elicit a higher error rate than real words (Wilshire, 1998).
291
All stimulus syllables conformed to German phonotactics and included all German vowels
292
and diphthongs (/aː, a, eː, ɛ, iː, ɪ, oː, ɔ, uː, ʊ, øː, œ, yː, ʏ, aɪ, ɔʏ, aʊ/), as well as all licit simple
293
codas (/p, t, k, f, s, ʃ, ç, x, m, n, ŋ, l, ɐ/). Stimulus pairs with identical vs. different vowels were
294
balanced across onset clusters. In all stimulus pairs except those with onset /ts/–/ks/, half of the
295
stimulus syllables had an identical vowel, while the other half differed in vowel. For /ts/–/ks/
296
stimuli, 10 stimulus pairs had different and six identical vowels. For each cluster pair, in
297
approximately one third of the stimuli the two syllables shared the whole rime, whereas in the
298
other two thirds, the rime differed between the two syllables. Seven syllables (/flɛm/, /kseːl/,
299
3
The two stimulus conditions were not directly connected to frequency and sonority predictions but used mainly to
increase competition between onset clusters. Frequency and sonority distance diverged in some, but not all, pairings. In the
main analysis, a logistic regression model was used to estimate effects of frequency and sonority, collapsing errors over
stimulus pairs.
4
The term metathesis is used here for convenience to describe cluster pairs in which the onset consonants appear in
reversed order relative to the partner cluster. It is not meant to describe the historical process of consonant metathesis.
PRODUCTION OF INITIAL CONSONANT CLUSTERS
14
/ksɛl/, /psɪç/, /ʃluː/, /ʃtɪŋ/, /tʃaː/) occurred in more than one stimulus pair. In addition to the 80 test
300
item pairs, 50 pairs of filler items with CVː, CVV, VC, or CVC structure were constructed. Two
301
lists of stimuli were created in order to counterbalance the order of the two syllables in a stimulus
302
pair (e.g., /ʃtœf tʃaf/ and /sloːn fliːm/ in list A; /tʃaf ʃtœf/ and /fliːm sloːn/ in list B).
303
In order to avoid sequence effects, the order of the stimuli was pseudorandomised for each
304
participant separately using the software Mix (van Casteren & Davis, 2006). The constraints for
305
pseudorandomisation were as follows: 1) No more than three test items occurred in direct
306
succession before a filler item intervened. 2) There was a minimal distance of three trials before
307
the same consonant cluster pair was repeated; this was used to prevent practice effects for any
308
particular cluster. 3) The same two cluster pairs could only alternate four times before a stimulus
309
with a different cluster pair occurred. (In reality, no two cluster pairs alternated that often.) 4)
310
Items of the highest articulation difficulty class were separated by at least one trial; this was done
311
to reduce fatigue effects. 5) The same vowel in either syllable 1 or syllable 2 could occur in no
312
more than two consecutive trials.
313
The stimuli were spoken by a female native speaker of German and recorded with an AKG
314
C2000B microphone in a soundproof booth using Adobe Audition. The recording was saved
315
directly on a computer with a sampling rate of 44.1 kHz (16-bit resolution). All syllables were
316
recorded several times and the best token of each syllable was selected for inclusion in the set of
317
audio stimuli. Using Praat (Boersma & Weenink, 2018), all stimulus syllables (i.e., test and filler
318
items) were then normalised to 65 dB SPL and concatenated by twos with a 500 ms silence
319
between them and a 500 ms silence after the second syllable to form the stimulus pairs.
320
321
PRODUCTION OF INITIAL CONSONANT CLUSTERS
15
3.3 Design and procedure
322
Prior to the experiment proper, participants completed a questionnaire and carried out a
323
short speech production task and a forward digit repetition task. The speech production task
324
comprised of casual reproduction of four sentences printed on a sheet of paper and was conducted
325
to check for regional influences on pronunciation. Participants were instructed to read the
326
sentences silently and then speak them out loud as if they were saying them in an informal
327
conversation. All of the productions were rated as standard-like by the first author, so all
328
participants were included in the experiment.
329
The digit repetition task was administered to control for effects of working memory.
330
Participants listened to four rows of six digits each. After each row, they had to repeat the six
331
digits orally once. The digit repetition task and the main experiment were carried out in Open
332
Sesame (Mathôt et al., 2012).
333
In the main experimental task, a tongue twister paradigm was used to elicit speech errors.
334
The tongue twister task was similar to one that has been proven to effectively elicit contextual
335
speech errors (Dell et al., 1997; Vousden & Maylor, 2006; see Wilshire, 1999, for a detailed
336
evaluation of the method). Instead of real words forming a phrase, we used the pseudoword pairs
337
exemplified in Table 1. Stimuli were presented auditorily. During each trial, participants heard a
338
stimulus pair twice over head-phones at a slow pace while the screen remained black. Shortly
339
after the offset of the last stimulus syllable, a white fixation dot appeared in the middle of the
340
screen to indicate that the participant should prepare to speak. The following production phase
341
was divided into a familiarisation phase and an elicitation phase. During the familiarisation
342
phase, participants repeated the stimulus sequence once at a pace of 63 beats per minute (bpm).
343
The purpose of the familiarisation phase was twofold: firstly, participants were given the chance
344
PRODUCTION OF INITIAL CONSONANT CLUSTERS
16
to produce a stimulus slowly before the challenging task of repeating it quickly (hence
345
“familiarisation”). Secondly, the recordings of the productions served to check whether or not the
346
stimuli had been correctly perceived. After this slow production, another white fixation dot
347
appeared in the middle of the screen to prepare participants for the error elicitation phase, in
348
which they repeated the sequence four times without a pause at a pace of 144 bpm. The speed for
349
stimulus production was indicated by auditory metronome clicks presented to the subjects over
350
headphones. Productions that were noticeably slower than the predefined speed were excluded
351
from analysis. The experiment was self-paced; a mouse click initiated the next trial.
352
Subjects’ productions were recorded over an AKG HSD 171 headset connected to a Fo-
353
cusrite iTrack Solo interface in wav-file format in Praat (Boersma & Weenink, 2018) on a
354
MacBook Pro. Additionally, recordings were made on a tape recorder with an internal
355
microphone placed about 50 cm from the participant in case there was a problem with the
356
primary recording. This backup was used for annotation of the data of the first two subjects due
357
to technical problems with the primary recording. All the other annotations were based on the
358
primary recording. Subjects were instructed to comment on auditory or memory-related
359
uncertainty (i.e., if they did not perceive the audio stimulus accurately or got confused during the
360
elicitation phase and forgot what the target was) after their productions. Productions followed by
361
such a comment were excluded from the analysis, as were productions from trials in which the
362
productions in the familiarisation phase deviated from the target. Four practice trials consisting of
363
stimuli with a simple CV structure were given to familiarise participants with the task and the
364
rhythm. Participants who were unable to complete the task after the four practice trials were
365
asked to repeat them. This was the case for five participants. The total duration of the experiment
366
varied between participants from around 45 to 60 minutes.
367
PRODUCTION OF INITIAL CONSONANT CLUSTERS
17
3.4 Data preparation and analysis
368
3.4.1 Data preparation
369
All productions were transcribed according to SAMPA conventions (broad phonetic
370
transcription). Since the productions of each trial were continuous repetitions of the stimulus, it
371
was not always clear-cut whether a consonant belonged to the coda of one syllable or the onset of
372
the following syllable. It was assumed that phones were assigned to the correct syllable positions
373
during production, so for consonants produced between the two vowels of a stimulus pair it was
374
taken into consideration whether that consonant phoneme occurred in coda or onset position in
375
one of the target syllables. For example, the interconsonantal [f] in [flaʊnfʃloːm] for target /flaʊn
376
ʃloːm/ would be counted as belonging to the onset of the second syllable; [m] in [flaʊmnʃloːm] as
377
belonging to the coda of the first syllable. If a produced phone was not part of the target sequence
378
in either position (e.g., the [f] in [spaʊfspaʊ] for target /spaʊ psaʊ/), the response was excluded
379
from the analysis. A random sample of approximately 13% of all produced test syllables (3,423
380
syllables stemming from nine different subjects) was transcribed by a second rater, a trained
381
linguist with transcription experience, who was naïve as to the object of the experiment. Inter-
382
rater reliability between onset transcriptions was very high (Krippendorff’s α = 0.932).
383
384
The following criteria led to exclusion of a production from analysis: 1) The whole syllable
385
could not be unambiguously ascribed to one of the two target syllables (0.07%). 2) The
386
production was unintelligible (0.27%). 3) The production clearly deviated from the speed
387
indicated by the metronome (0.31%). 4) The onset was already produced incorrectly in the
388
familiarisation phase (6.05%). 5) The subject made a comment about perception or memory
389
problems (5.15%). 6) No response was provided (1.01%). Data from two subjects were excluded
390
PRODUCTION OF INITIAL CONSONANT CLUSTERS
18
from the analysis entirely because they failed to correctly produce more than half of the test items
391
during the familiarisation phase and were clearly not concentrating on the experimental task.
392
Further-more, two stimuli (/tsaː ksaː/ and /fløː sluː/) in list B were excluded because the wrong
393
audio file was attached for one of the syllables. (The corresponding stimuli /ksaː tsaː/ and /sluː
394
fløː/ in list A were included.) This left 21,758 observations of the original 26,240 observations (8
395
productions per trial × 80 test trials × 41 subjects) in the remaining data set.
396
If target and produced onset deviated phonemically, this was counted as an error. Otherwise,
397
the production was counted as correct. Subphonemic anomalies (observed in 807 productions)
398
and the syllable rime were not analysed further.
399
3.4.2 Analyses
400
To determine which factors make a consonant cluster error prone, a mixed-effects logistic
401
regression model was fitted with the lme4 package (Bates et al., 2015) in R (R Core Team, 2016)
402
with error as the binary dependent variable. Model fitting was done by entering of the following
403
variables: log cluster frequency (continuous), sonority distance (factor with four levels), summed
404
frequency of neighbouring consonant clusters (continuous), type of stimulus pairing (metathesis
405
or minimal pair), coda in the previous syllable (making the onset more complex in continuous
406
speech), coda condition (coda identical or different in the two syllables of a stimulus pair or no
407
coda), and digit repetition score (continuous). The model also included an interaction between log
408
cluster frequency and sonority distance. After fitting this model, non-significant variables were
409
taken out stepwise, and AICs for the models with and without the predictor were compared until
410
no insignificant predictors were left in the model. The model included random intercepts for
411
subjects and items and random by-subject slopes for the frequency and sonority effects. All
412
continuous variables were centred and factor variables were sum-coded. Since an error is likely to
413
PRODUCTION OF INITIAL CONSONANT CLUSTERS
19
lead to further errors on the same item in the following repetitions (Humphreys et al., 2010), an
414
analogous model was run in which only the first error of a trial on each syllable was counted. The
415
results were largely the same and will not be discussed further.
416
Additionally, error rates of some clusters were compared pairwise. These observed
417
differences in error rates were collated with predictions based on cluster frequency and sonority
418
sequencing as to which cluster would have a higher error rate. Only pairs for which predictions
419
based on frequency differ from those based on sonority were considered.
420
To assess the potential improvement of clusters (in terms of frequency and sonority) in
421
speech errors that result from direct competition within the stimulus pair, log frequencies and
422
sonority distances of target and produced clusters were compared. Only syllables produced with
423
an onset cluster were included in this analysis since the frequency of a cluster cannot be
424
compared to that of a single consonant and simple onsets naturally do not have a sonority
425
distance. However, all productions of legal German CC onsets were considered, not only the set
426
of test clusters.
427
4. Results
428
Of the 21,758 test syllables left in the experiment corpus, 3008 included a production error,
429
yielding an overall error rate of 13.8%. Error rates varied between subjects from 1.9% to 29.4%.
430
Error rates also varied considerably over onset clusters; they range from 2.7% for /tr/ to
431
35.6% for /ks/ and /tʃ/ (see Figure 2). However, they also diverged greatly within the same target
432
cluster depending on which cluster it was paired with (see Figure 3). The three clusters that are
433
part of two different pairs had a much higher error rate in the metathesis pair (i.e., in a stimulus
434
together with their reversed counterpart) than in the non-metathesis pair.
435
PRODUCTION OF INITIAL CONSONANT CLUSTERS
20
Around 42% of the errors were phonotactically illegal (non-native singleton onset
436
phonemes as well as illegal onset clusters, see below). Moreover, 3.7% of all produced onsets
437
(including target and non-target phonemes) were phonetically anomalous. Phonetic anomaly was
438
determined by auditory inspection and included phonemes that were considerably shorter in
439
duration than normal productions, segments that contained characteristics of two different
440
phonemes (e.g., intensity peaks at several frequency bands, indicative of simultaneous
441
constriction gestures at two different places of articulation), and stops whose VOTs fell in
442
between those for voiced and voiceless stops, among others.
443
Addition errors were more than twice as frequent as deletion errors (33.4% vs. 14.2%).
444
Many of the produced onsets that consisted of more than two consonants contained realisations of
445
both competing consonants (e.g., /fsl–/ in target /fl–sl/ pairs or /sps–/ in /sp–ps/ pairs), which
446
contributed to the high number of illegal onsets. The highest number of errors was substitutions.
447
Internal substitutions, that is, substitutions with the partner cluster, accounted for 39.0% of all
448
errors, while external substitutions made up 13.4%.
449
450
4.1 Logistic regression
451
The final model for the complete dataset included log cluster frequency, sonority distance,
452
stimulus pairing, complex cluster (i.e., a coda in the previous syllable), coda condition, and digit
453
repetition score.
454
PRODUCTION OF INITIAL CONSONANT CLUSTERS
21
As can be seen in Table 2, there was an effect of frequency, in line with usage-based theory:
455
the higher the frequency of a consonant cluster, the lower its error probability (see Figure 4).
5
456
The effect of sonority distance was only significantly different from the grand mean for clusters
457
with a sonority distance of 1 (i.e., the stop–sibilant clusters /tʃ/, /ts/, /ks/, and /ps/, as well as the
458
sibilant–nasal clusters /ʃm/ and /ʃn/). For these clusters, the error rate was increased (see Figure
459
5). For clusters with a sonority distance of 2, the error rate was marginally significantly lower
460
than the grand mean.
461
The effect of stimulus pairing was such that metathesis pairs had significantly higher error
462
rates than minimal pairs. A coda in the previous syllable also led to significantly higher error
463
rates. The effect of coda identity was significant for all levels: stimuli with identical and non-
464
identical codas had significantly lower estimates and stimuli without a coda significantly higher
465
estimates than the grand mean. The latter might seem surprising but is due to a confound: During
466
stimulus creation, no coda was used for onset pairings that were particularly difficult to produce.
467
As regards digit repetition, there was a tendency for participants with high error rates in this task
468
to have increased error rates in the tongue twister task as well, but it was only marginally
469
significant.
470
471
5
As described in the methods section, we used type frequencies as the main frequency measure. However, ongoing debate
about which frequency measure, type or token, is most relevant for which task calls for a comparison of the two (e.g., Hofman et
al., 2007). Moreover, CELEX frequencies have been subject to criticism and larger, more modern, corpora advocated (e.g.,
Brybaert et al., 2011). We therefore used three additional frequency measures in separate models not reported here: type
frequencies extracted from the elexiko online dictionary (elexiko, 2003ff.), CELEX token frequencies, and CLEARPOND token
frequencies (Marian et al., 2012). While elexiko type frequencies showed an effect similar to that of CELEX type frequencies,
neither token frequency measure showed a significant effect.
PRODUCTION OF INITIAL CONSONANT CLUSTERS
22
4.2 Comparison of error rates within cluster pairs
472
Table 3 shows the comparison of error rates for cluster pairs along with their frequency- and
473
sonority-based predictions (cf. Table 1). All reported differences in observed values are
474
significant at a .01 level according to χ2 tests. As can be seen from the table, frequency-based
475
predictions were better for all cluster pairs for which frequency-based and sonority-based
476
predictions diverged.
477
4.3 Comparison of targets and produced onsets
478
Figure 6 shows the clusters that were produced in substitution errors. The frequency
479
comparison of target and produced clusters in substitution errors revealed that a cluster was
480
replaced by a higher-frequency cluster more than twice as often as by a lower-frequency cluster
481
(see Table 4). There was, however, a big difference between LF and HF targets: LF targets were
482
replaced by a higher-frequency cluster in 78.5% of all substitution errors, while HF clusters were
483
replaced by clusters of even higher frequency in only 34.5% of substitution errors.
484
In the case of sonority profiles, however, the situation was reversed: most substitutions that
485
led to a change in sonority distance deteriorated the sonority profile of the cluster. Similar to the
486
frequency comparison, there was massive variation depending on the initial value of the target
487
cluster: for SSP-violating clusters, 90.3% of substitutions improved sonority distance, whereas
488
for SSP-conforming clusters, only 4.5% did.
489
5. Discussion
490
The aim of the present study was to compare the influences of language-specific
491
phonotactic distributions and sonority sequencing on consonant cluster production. Specifically,
492
it was investigated to what degree high cluster frequency and a large sonority distance contribute
493
to facilitation in the production of initial German consonant clusters. German phonotactics allows
494
PRODUCTION OF INITIAL CONSONANT CLUSTERS
23
initial sibilant–stop clusters, which display a decrease in sonority and thus violate the SSP. Since
495
two of them (/ʃt/ and /ʃp/) are among the most frequent in the German language, they presented
496
particularly interesting cases of diverging predictions. In line with usage-based theories of
497
language (e.g., Bybee 1999, 2010), we hypothesised that production accuracy would be higher
498
for HF clusters than for LF clusters and that speech errors would tend to result in production of
499
HF clusters. We will discuss the results in turn, starting with our main predictors and then turning
500
to the control variables.
501
5.1 Frequency
502
In line with our prediction, cluster frequency had a facilitating effect on production
503
accuracy. Error rates decreased as a function of cluster frequency. This is in line with a number of
504
previous studies reporting frequency effects on other linguistic units, such as single segments
505
(Levitt & Healy, 1985), words (Oldfield & Wingfield, 1965), and syllables (Bürki et al., 2015;
506
Cholin et al., 2006; Laganaro & Alario, 2006), although the latter usually only find an influence
507
of frequency on production latencies, not accuracy. Santiago et al. (2007) even observed
508
inhibitory effects of phoneme, syllable, and word frequencies. The speech errors they analysed
509
occurred in real words in spontaneous speech, however. Likewise, Stemberger (2004) reported
510
“anti-frequency effects” in elicited speech errors on real words. Both situations are different from
511
the artificially elicited errors in pseudowords reported here. The present data suggest that
512
sublexical frequencies do play a role for processing but that their effect is often masked by larger
513
effects in the opposite direction, usually stemming from the lexical level. By using pseudowords
514
rather than real words, influences from the lexical level were minimised, which allowed the effect
515
of cluster frequencies to emerge. Work by Vitevitch and Luce (1998) has revealed that on the
516
lexical level, competition between lexical nodes usually dominates and causes inhibitory effects
517
of neighbourhoods, whereas processing of nonwords directs attention to the sublexical level, so
518
PRODUCTION OF INITIAL CONSONANT CLUSTERS
24
that facilitating effects of sublexical frequencies become visible. In fact, Stemberger (2004, p.
519
419) even predicted that the anti-frequency effect he found with words would be absent in
520
experiments that use nonce words and assumed that in such cases even a weak frequency effect
521
might be observable. This is indeed what can be seen in the present data. This shows that, once
522
all effects are eliminated on the lexical level, the facilitative role of sublexical frequencies in
523
speech production can surface. In everyday speech production, such an effect is more likely
524
overridden by stronger effects on the lexical level.
525
For future experiments, it would be desirable to have more pairs with diverging frequency
526
that do not have the added difficulty of consonant metathesis. Moreover, an influence of natural
527
class seems to override the frequency effect: one of the most obvious results of the experiment is
528
that stop–sibilant clusters cause production problems and easily lead to slips of the tongue (also
529
in non-metathesis pairs, see Figure 3). In the cluster pairs that do not contain any of those
530
clusters, error rates are consistently higher for the cluster of lower frequency, with the only
531
exception of /tr/–/kr/, which have very similar frequencies.
532
The sublexical frequency effect is largely in accordance with findings on the frequency
533
effects of other sublexical units (e.g., Andrews, 1992; Levitt & Healy, 1985; Tremblay, 2016). As
534
discussed above, however, frequency effects do not always surface and sometimes are only
535
visible when examined with very sensitive measures, such as reaction times (e.g., Cholin, 2004).
536
It is therefore remarkable that an effect of cluster frequency was visible at all, even when
537
applying a coarse dependent variable, such as repetition accuracy.
538
In terms of outcomes of speech errors, we hypothesised that slips would more often result in
539
HF clusters than LF clusters. In other words: HF clusters attract responses. The reasoning behind
540
this is that HF clusters present strong competitors for LF targets and, in noisy situations, can win
541
PRODUCTION OF INITIAL CONSONANT CLUSTERS
25
the competition. The comparison of substitution errors resulting in higher- vs. lower-frequency
542
clusters supports this hypothesis: consonant clusters were replaced by higher-frequency clusters
543
more than twice as often as by lower-frequency clusters. The production of target /ps/ as [ts] is an
544
example: even in the absence of direct syntagmatic competition (the two are not paired in a
545
stimulus), the LF target /ps/ is produced as HF [ts] 60 times. The reverse is not true: target /ts/
546
was produced as [ps] only twice. This asymmetry replicates a finding by Motley and Baars
547
(1975) that more errors occur when the intruding phoneme is more probable than the target and is
548
also plausible from a theoretical perspective: if LF structures present the speaker with difficulties,
549
the system should resort to an easier structure in order to overcome the problems rather than one
550
of a similar level of difficulty as the target (cf., e.g., Reason, 1992).
551
Summing up, it can be stated that, all other things being equal, LF clusters are more error-
552
prone than HF clusters, and errors tend to result in the production of a HF cluster. The fact that a
553
facilitating effect of frequency emerged for consonant clusters, just like for other linguistic units
554
previously, suggests that initial consonant clusters are a unit relevant in speech production
555
planning. We argue that they (or at least some of them) have mental representations as holistic
556
units and that frequent use strengthens these representations. This could parallel the storage of HF
557
syllables in a mental syllabary assumed by Levelt and collaborators (e.g., Cholin et al., 2006;
558
Levelt & Wheeldon, 1994). The underlying assumption is that during phonetic encoding,
559
speakers can retrieve syllabic gestural scores of HF syllables from the syllabary, while no such
560
entries exist for LF syllables; they have to be assembled on-line. Similarly, language users could
561
overlearn HF clusters and develop mental representations for them, which makes them both more
562
error-resistant and strong candidates to default to in production when processing load is high.
563
PRODUCTION OF INITIAL CONSONANT CLUSTERS
26
The sublexical frequency effects found here are also reminiscent of findings from studies
564
with impaired populations that gave rise to the non-linear gestural (NLG) model of speech
565
apraxia in which “the degree of gestural cohesion […] is […] modulated through extensive
566
speech motor learning during language acquisition and, as a consequence, is language-specific.”
567
(Ziegler, Aichert, & Staiger, 2017:147). It should be noted, however, that the effects in the
568
present study were effects of type frequency, not token frequency, and hence more likely related
569
to a representational level rather than speech motor learning.
570
The existence of frequency effects of sublexical units has implications for the interpretation
571
of categorical phonotactic effects. If the human speech production system is biased towards the
572
output of frequent strings of phones, as has been shown here, then there is no need to postulate a
573
separate mechanism that filters output in terms of categorical phonotactic rules. Phonotactically
574
illegal strings can simply be seen as extreme cases of low frequency, which the production
575
system accordingly has a strong bias against. Dell et al. (1993) have demonstrated this in
576
computer simulations of English CVC word production, in which the parallel distributed
577
processing model did not contain any explicit phonotactic rules but derived a very strong
578
tendency for phonotactically legal output (83–100%) solely on the basis of adequate input
579
vocabulary and feedback mechanisms concerning the sequential progress.
580
5.2 Sonority
581
Clusters with a sonority distance of 1 had significantly higher error rates than all other
582
clusters. Obviously, the effect is not monotonous. This is at odds with phonological theory, which
583
predicts that error rates should be highest for clusters with a sonority distance of –1 (i.e., clusters
584
that violate the SSP) and then drop steeply towards a sonority distance of 1 and more gradually—
585
if at all—afterwards. This is because violations of the SSP are expected to be most problematic;
586
PRODUCTION OF INITIAL CONSONANT CLUSTERS
27
SSP-violating clusters have been shown to be error-prone, at least for some populations (mostly
587
speakers with apraxia of speech and/or aphasia, cf. Bastiaanse et al., 1994; Code & Ball, 1994;
588
Miozzo & Buchwald, 2013; Romani et al., 2011, and partly in children during L1 acquisition, cf.
589
Barlow, 2005; D. Ohala, 1999). However, according to the SDP, there are well-formedness
590
differences even within the group of SSP-conforming clusters. The steeper the rise in sonority
591
syllable-initially, the more well-formed the syllable. Therefore, one might even expect a slight
592
decrease in error rates from clusters with a distance of 1 to 2 and 3. The data clearly show that
593
this is not the case. Figure 2 reveals that among the clusters with a sonority distance of 1, it is the
594
group of stop–sibilant clusters that causes this effect, while /ʃm/ and /ʃn/ have low error rates.
595
Although their sonority profile is not optimal, there is no reason from a sonority-theoretical
596
perspective why they should cause more problems than their reversed counterparts, which violate
597
the SSP. The difference in error rates thus cannot be satisfactorily explained by sonority theory,
598
and the hypothesis that sonority-based well-formedness of initial consonant clusters affects their
599
production accuracy in healthy adult speakers was not supported.
600
In terms of error outcomes, sonority theory predicts—and it was hypothesised—that errors
601
would improve sonority profiles, as can often be observed in apraxic speech errors (Romani et
602
al., 2011). However, the opposite was the case in the data from the present experiment: in the
603
majority of cases, the sonority profile was deteriorated by the speech error; only in a minority of
604
cases was it improved or maintained. The outcome analysis therefore contradicted sonority
605
predictions just as strongly as the analysis of error rates. Hence, sonority sequencing cannot
606
account for the patterns found in the error data from the experiment. This is in line with previous
607
research that finds sonority effects primarily in children and speech-impaired populations and for
608
non-native clusters (Broselow & Finer, 1991). It suggests that, even under increased processing
609
pressure, sonority sequencing does not influence healthy adults’ consonant cluster production. In
610
PRODUCTION OF INITIAL CONSONANT CLUSTERS
28
contrast to children and L2 learners, the overlearning of their native phonotactic system seems to
611
have desensitised them to sonority-based biases.
612
The pattern of internal errors that is at odds with sonority theory is interesting: in all
613
metathesis pairs, the stop–sibilant cluster is substituted by the sibilant–stop cluster substantially
614
more often than the reverse, which suggests that it is the structure of the clusters (i.e., their
615
composition in terms of natural classes) that determines their strength. This is what obscured the
616
frequency effect in metathesis pairs. On a more general scale, the error rates across clusters (see
617
Figure 2) and false positives across clusters (see Figure 6) confirm this picture: on the one hand,
618
the three stop–sibilant clusters have the highest error rates, and on the other hand, sibilant–stop
619
clusters have the highest rates of false positives. In other words: sibilant–stop clusters appear to
620
be strong in that they often act as intruders (even in non-contextual errors), while stop–sibilant
621
clusters appear to be weak in that they are the most error-prone clusters. Initial /ts/, which might
622
be argued to be an affricate in German phonology, does not deviate from the clusters /ps/, /ks/,
623
and /tʃ/ in any way that could not be explained by its far higher frequency. The present data
624
therefore give no reason to believe that its production is facilitated due to its potential affricate
625
status, and it will be discussed along with /ps/, /ks/, and /tʃ/.
626
The special status of sibilant–stop clusters, both in terms of their distribution and their role
627
in speech acquisition and processing, has received some scholarly attention over the past decades
628
(Dziubalska-Kołaczyk, 2015; Goad, 2011; Morelli, 1999). In spite of the fact that they violate the
629
SSP, they are relatively common cross-linguistically (Morelli, 1999), are acquired early in L1
630
acquisition (Dziubalska-Kołaczyk, 2015), and stand out phonetically and articulatorily (Browman
631
& Goldstein, 1986; Byrd & Choi, 2010). In a speeded naming experiment, they had shorter
632
response latencies than singleton /s/ onsets, although latencies had previously been found to
633
PRODUCTION OF INITIAL CONSONANT CLUSTERS
29
increase with the number of phonemes (Kawamoto & Kello, 1999). Numerous accounts have
634
been proposed regarding their structural representation to attempt to resolve the dilemma of such
635
a common class of consonant clusters as violating sonority sequencing. They range from an
636
extra-syllabic position of the sibilant (Harris, 1994) to such sequences as single, complex
637
segments (Browman & Goldstein, 1986; Fudge, 1969; Selkirk, 1982; Wiese, 2000; but see
638
Treiman, 1986, for counter-evidence) or at least as having high intersegmental cohesiveness
639
(Berg, 1989; Tzakosta, 2009).
640
In an interpretation of a pre-stop sibilant as extra-syllabic, the cluster’s cohesiveness would
641
be very low and the cluster should not behave like a unit. One would therefore not expect it to
642
show frequency effects, either. However, the data from the present experiment showed clear
643
frequency effects for such clusters (compare, for example, error rates of HF clusters /ʃt/ and /ʃp/
644
with those of LF clusters /sk/ and /sp/). The explanation of extra-syllabicity is therefore at odds
645
with the present findings. It is also in conflict with findings from articulatory studies (e.g.,
646
Bombien et al., 2010; Pouplier et al., 2022), in which German initial sibilant-stop clusters showed
647
an intermediate to relatively large degree of gestural overlap and large variability.
648
An interpretation of the sibilant–stop clusters as having high internal cohesiveness and
649
being more unit-like, on the other hand, would explain their relative strength in the error data
650
from the tongue twister experiment. What remains to be explained, however, is why the reversed
651
structure, stop–sibilant clusters, is so problematic in production.
652
A helpful approach is that of Tzakosta (2009), who makes a four-way distinction between
653
complex onsets. Based on data from cluster reduction during Greek L1 acquisition, she
654
discriminates between 1) true clusters (e.g., stop–liquid), 2) sibilant–stop clusters, 3) stop–sibilant
655
clusters, and 4) affricates, to which she ascribes different degrees of cohesiveness (listed here in
656
PRODUCTION OF INITIAL CONSONANT CLUSTERS
30
ascending order). The two cluster groups that stand out in the present experiment are thus
657
contrasted with true clusters in her analysis. The most common repair strategy for true clusters is
658
reduction which follows the principle of sonority optimisation, that is, the less sonorous
659
consonant in C1 position is preserved, leading to a steep rise in sonority in the transition to the
660
vowel. In the data from the adult speakers in the present experiment there is an overwhelming
661
tendency for true clusters to follow the same principle. C1 is almost always preserved, while C2
662
is deleted relatively often. The cluster /ʃm/ is the only obstruent–sonorant cluster that does not
663
follow this pattern. Here /ʃ/ is deleted more often than /m/.
664
Both sibilant–stop and stop–sibilant clusters, on the other hand, show the opposite trend: the
665
less sonorous stop is deleted most often, which is a reduction strategy contra sonority principles
666
(although, of course, reduction to the sibilant also constitutes a sonority improvement when
667
compared to the full cluster). In terms of sonority, they thus behave unlike the “true” clusters.
668
A high rate of addition errors is also common to both groups, the majority of which are
669
productions of C2–C1–C2 sequences (e.g., [sps] for target /ps/). Subjects often started to produce
670
the wrong onset in metathesis pairs and then fused the two competitors into a more complex (and
671
illegal) cluster. This fusion can be taken as an indication that both competitor clusters have a high
672
level of cohesiveness. If stop–sibilant clusters, like sibilant–stop clusters, form a (relatively)
673
cohesive unit, however, why are they by far the most problematic cluster group? The most
674
straightforward explanation is that, as a cluster class, they are not native (i.e., they only occur as
675
onsets of loan words) and have a low frequency of use. This would also explain why their
676
apparent strength relative to sibilant–stop clusters in the present data is reversed when compared
677
to Tzakosta’s hierarchy: stop–sibilant clusters are native to Greek.
678
PRODUCTION OF INITIAL CONSONANT CLUSTERS
31
The results of the current experiment add further evidence to a growing body of research
679
that shows that sibilant–stop clusters and stop–sibilant clusters have a special status in a number
680
of languages (e.g., Goad, 2011; Morelli, 1999) and behave differently in speech production as
681
well. This means that, in addition to language-specific factors like cluster frequencies, a structural
682
and potentially universal component does play a role in consonant cluster production: sibilant–
683
stop clusters are easier to produce than stop–sibilant clusters. However, it is not related to SSP
684
violations but rather seems diametrically opposed to the concept of simplification through steady
685
sonority growth within the syllable. What constitutes this component is still open for debate. The
686
results of this experiment support an interpretation in terms of high intersegmental cohesiveness.
687
Why these two cluster types are split up by speech errors more often than the other clusters needs
688
further investigation; the opposite would be expected for highly cohesive units. The data on stop–
689
sibilant clusters might also explain the scarcity of such clusters in the German language. Of
690
course, it is difficult to distinguish between cause and effect with certainty, but it seems plausible
691
that initial stop–sibilant clusters, which caused so many problems for the subjects in this study,
692
are absent in native German words and rare in loan words precisely because they are difficult to
693
pronounce.
694
If these special cases are left aside, though, an influence of sonority on error patterns is
695
visible: in deletion errors in “normal” clusters, it was almost always the more sonorous consonant
696
that was deleted, which created a steep rise in sonority in the onset. The expected effect of higher
697
error rates on SSP-violating clusters and their repair as SSP-conforming clusters did not emerge,
698
however, unless stops and fricatives are assigned the same sonority value and both cluster types
699
discussed here are thus considered to be plateau clusters. In that case, however, the substantially
700
higher error rates of stop–sibilant clusters (as opposed to sibilant– stop clusters) cannot be
701
explained.
702
PRODUCTION OF INITIAL CONSONANT CLUSTERS
32
5.3 Cluster neighbourhood and competition
703
In contrast to many previous studies, there was no significant effect of neighbourhood
704
frequency in the present one. However, results from past studies are inconclusive as to the
705
direction of the effect, with some finding facilitating (Andrews, 1992; Harley & Bown, 1998;
706
Vitevitch, 2002, 2003) and some inhibitory effects (Sadat et al., 2014; Vitevitch, 2007) of lexical
707
neighbourhoods. For example, Sadat et al. (2014) differentiated between disrupted speech
708
production, where neighbourhoods can have a facilitating effect, and normal speech production,
709
where they usually have an inhibitory effect. The null effect in the present experiment could be
710
explained based on the fact that we tested unimpaired but the production task was far more
711
demanding than normal speech production. This situation might have mimicked impaired speech
712
production. For example, Meffert et al. (2011) show how aphasic behaviour can be induced in
713
healthy speakers by applying production tasks with high cognitive demands. Hence, the
714
inhibitory effect of dense, HF neighbourhoods typically found in healthy speech production may
715
have emerged concurrently with facilitative effects caused by the demanding task, so that the two
716
opposing forces cancelled each other out and none of them could reach significance.
717
Alternatively, it is possible that the lack of a neighbourhood effect is due to the neighbourhood
718
measure used in the present experiment, consonant cluster neighbourhood, which differs from the
719
lexical neighbourhoods used in other studies. Here, neighbourhood frequency was defined on a
720
sublexical level, namely as the summed frequencies of all initial clusters that differ from the
721
target in one phonological feature. It could be that effects of neighbourhood frequency are limited
722
to the lexical level and sublexical units simply have no neighbourhoods. This explanation is less
723
likely, however, as the measure of cluster neighbourhood frequency did show the predicted
724
inhibitory effect in two cluster perception experiments (Wulfert et al., submitted), thus
725
PRODUCTION OF INITIAL CONSONANT CLUSTERS
33
demonstrating the appropriateness of the measure. Further research is needed to fully understand
726
the role of lexical and sublexical neighbourhoods in speech production.
727
728
5.5 Consonant metathesis
729
The metathesis pairs /ʃt–tʃ/, /sk–ks/, and /sp–ps/ had significantly higher error rates than all
730
other pairs. This is very obvious when examining the clusters that occur both in a metathesis pair
731
and in a non-metathesis pair. For all three clusters, error rates were significantly higher in the
732
metathesis pair than in the non-metathesis pair. This shows that it is not only a cluster’s inherent
733
difficulty that determines the error probability but also the degree of competition it receives
734
during the planning of an utterance. Competition between alternative speech plans has long been
735
recognised as part of the speech planning process (cf., for example, the Competing Plans
736
Hypothesis, Baars, 1980). On the sublexical level investigated here, it often surfaced as both
737
competing phones being produced (e.g., /fsl–/ in target /fl–sl/ pairs or /sps–/ in /sp–ps/ pairs), thus
738
leading to the unusually high number of phonotactically illegal onsets of 42%. This exceeds even
739
recent findings that speech errors do not obey phonotactics as much as previously thought
740
(Alderete & Tupper, 2018) by far. Many previous studies (e.g., Pouplier & Goldstein, 2010) that
741
primed for the production of a non-target consonant have reported an effect of “double
742
articulation” even within a single segment, that is, constriction gestures as required for both
743
competing consonants are carried out simultaneously. In the present study, double articulation
744
could not be determined as articulatory gestures were not measured. But in addition to the
745
unanalysed subphonemic errors making the impression of double articulation, the sequences
746
described above are evidence of successive articulation of gestures for both competing segments.
747
The tendency for production of both competing consonants was found in all cluster pairs.
748
However, it seems to be greatest when a consonant cluster alternates with its reversed
749
PRODUCTION OF INITIAL CONSONANT CLUSTERS
34
counterpart. Note that, in contrast, /fl/ had equally low error rates of 3.0% and 3.5% when
750
competing with /sl/ and /ʃl/, respectively.
751
A likely explanation based on connectionist models of speech production (e.g., Dell, 1986;
752
O’Seaghdha & Marin, 2000) is that both consonants of the onset clusters are strongly activated
753
because they occur (at least) twice in the planned utterance. This increases the competition
754
between them and can lead to the wrong segment being produced in situations of increased
755
cognitive load. It is noteworthy that such strong competition was induced between the onset
756
consonants in the metathesis pairs, while there was no increased competition between identical
757
consonants in onset and coda position (e.g., /n/ or /t/ in the stimulus /ʃtaːn ʃnaːt/). This
758
observation is in line with previous research that shows that contextual errors involve segments in
759
the same syllable position (e.g., Fromkin, 1971; MacKay, 1978).
760
Alternatively, the production of both competing consonants could represent cases of
761
immediate self-repair of the speech error. However, the fluency with which the successive
762
consonants were produced speaks against this explanation. Literature on self-repairs reports time
763
intervals of 150 – 350 ms on average (Plug & Carter, 2014; Hartsuiker & Kolk, 2001; Pillai,
764
2002) and a clear error—cutoff—repair sequence, which is not observable in the present data.
765
766
5.5 Other factors
767
While our analyses have revealed the influence of cluster frequency, segmental composition
768
competition, coda in the previous syllable (i.e., total length of the consonant cluster), and working
769
memory on error rates, a considerable degree of variation is still not accounted for. For example,
770
there was a relatively large variability between subjects. The differences in performance in the
771
digit repetition task were only marginally significant and cannot account for it. Neither could a
772
history of developmental conditions. It could be due to differences in speech rate or syllable
773
PRODUCTION OF INITIAL CONSONANT CLUSTERS
35
duration, which we did not systematically analyse. Relatedly, a reviewer suggested that some of
774
the between-subject variability in error rates could be due to differences in realisation of the
775
stimuli as one or two prosodic units, due to differences in coarticulation between the syllables or
776
differences in working memory buffer. While we did not analyse differences in prosodic
777
realisation, it could be interesting to test for such systematic effects in future studies with a
778
similar task design.
779
780
6. Limitations of the study and future work
781
There are some limitations as to the generalisability of data from the experiment and the
782
conclusions that can be drawn from them. A study that relies entirely on broad phonetic
783
transcription must be interpreted with caution (see Frisch & Wright, 2002; Pouplier, Marin, &
784
Kochetov, 2017) because many subphonemic errors cannot be captured by this coarse metric but
785
only by fine articulatory or acoustic data. The inspection of the production data also clearly
786
showed that many production errors were subphonemic articulation abnormalities that were
787
impossible to accurately analyse with the methods applied here. It would be desirable to capture
788
such errors more accurately with articulographic or acoustic measures and analyse them
789
statistically in order to get a more precise picture of which clusters are the most stable and which
790
ones the least.
791
The choice of the specific clusters used in the experiments may have contributed
792
significantly to the results. A comparison of stop–nasal vs. stop–liquid clusters might have
793
yielded quite different results concerning sonority than the comparison of stop–sibilant vs.
794
sibilant–stop clusters did. The contrast chosen here revealed an important limitation to the
795
validity of sonority sequencing and provided potential explanations for it.
796
PRODUCTION OF INITIAL CONSONANT CLUSTERS
36
Since Stemberger (1991, 2004) argued that frequency effects (albeit on the phoneme level)
797
become apparent in nonce words, while anti-frequency effects arise in real words, it would be
798
interesting to compare the results from this study with data from an analogous study with real
799
words. If Stemberger is right, the pattern of results in such a parallel study should deviate
800
considerably from that found here.
801
7. Conclusions
802
The aim of this study was to determine to what degree production accuracy of initial
803
consonant clusters is influenced by their language-specific frequencies and to what degree by
804
their sonority distance. We tested healthy adult speakers of German in a tongue twister paradigm,
805
using pairs of pseudowords with initial consonant clusters as stimuli.
806
The experiment succeeded in eliciting contextual speech errors by introducing competition
807
between similar clusters. In addition to an effect of complexity (operationalised as the number of
808
consecutive phonemes), it demonstrated an effect of frequency, which has important implications
809
for our understanding of how previous experience(s) and statistical reckoning are used during on-
810
line cognitive processing. Frequency effects on accuracy are hard to find for the population
811
studied here—adults without language impairment. By enforcing competition between two
812
clusters, however, it was possible to disclose an effect of cluster frequency. Clusters of higher
813
frequency tended to have both lower error rates and higher rates of “false positives” (i.e., were
814
produced instead of the intended target), but there are a number of other factors that can distort
815
this pattern. Most notably, some combinations of natural classes are very strong, while others are
816
very error-prone, especially in a metathesis context.
817
PRODUCTION OF INITIAL CONSONANT CLUSTERS
37
In contrast to frequency, sonority sequencing did not show the predicted effect. Instead,
818
only clusters with a sonority distance of 1 (specifically, stop–sibilant clusters) had a significantly
819
higher error rate than all other clusters, which is at odds with sonority theory and demonstrates
820
the importance of clusters’ composition with regard to natural classes. As regards the outcomes
821
of speech errors, there was a clear split between well-formed clusters (i.e., those with an
822
undisputable sonority rise) and obstruent–obstruent clusters: while the more sonorous consonant
823
was usually deleted in the former, leading to sonority optimisation, speech errors in the latter
824
often led to sonority deterioration. The direct comparison of error rates within cluster pairs
825
confirmed that usage-based theory made better predictions for error rates than sonority theory.
826
Taken together, the results indicate that, in addition to structural aspects of the stimuli and
827
competition between clusters composed of the same consonants, experience with language-
828
specific distributions affects production accuracy in unimpaired adults, while sonority sequencing
829
does not. Speakers have produced high-frequency clusters so often that have become highly
830
practised and their mental representations probably are stronger. This also causes speakers to
831
produce them erroneously—as an easy default—instead of a lower-frequency target cluster.
832
Acknowledgements
833
This manuscript was funded by the German Research Foundation (DFG) as part of the
834
Research Training Group 1624 Frequency Effects in Language.
835
Data Availability Statement
836
The datasets generated and analysed during the current study are available in the Open Sci-
837
ence Framework, https://osf.io/rmj5v/.
838
PRODUCTION OF INITIAL CONSONANT CLUSTERS
38
References
839
Aichert, I., & Ziegler, W. (2004). Syllable frequency and syllable structure in apraxia of speech.
840
Brain and Language, 88(1), 148–159.
841
Andrews, S. (1992). Frequency and Neighborhood Effects on Lexical Access: Lexical Simi-
842
larity or Orthographic Redundancy? Journal of Experimental Psychology, 18(2), 234–254.
843
Baars, B. J. (1980). The competing plans hypothesis: An heuristic viewpoint on the causes of
844
errors in speech. Temporal Variables in Speech, 39–49.
845
Barlow, J. (2005). Sonority effects in the production of consonant clusters by Spanish speaking
846
children. In Selected proceedings from the 6th Conference on the Acquisition of Spanish and
847
Portuguese as First and Second Languages, 1–14.
848
Bastiaanse, R., Gilbers, D., & Van Der Linde, K. (1994). Sonority substitutions in Broca’s and
849
conduction aphasia. Journal of Neurolinguistics, 8(4), 247–255.
850
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious Mixed Models. arXiv
851
preprint arXiv:1506.04967, 1–27.
852
Béland, R., Caplan, D., & Nespoulous, J. L. (1990). The role of abstract phonological
representations in word production: Evidence from phonemic paraphasias. Journal of
Neurolinguistics, 5(2-3), 125-164.
Bell, A., Brenier, J. M., Gregory, M. L., Girand, C., & Jurafsky, D. (2009). Predictability effects
853
on durations of content and function words in conversational English. Journal of Memory and
854
Language, 60(1), 92–111.
855
Berent, I. (2013). The phonological mind. Trends in cognitive sciences, 17(7), 319-327.
856
Berent, I., Steriade, D., Lennertz, T., & Vaknin, V. (2007). What we know about what we have
857
never heard: Evidence from perceptual illusions. Cognition, 104, 591–630.
858
Berent, I., Harder, K., & Lennertz, T. (2011). Phonological universals in early childhood:
859
Evidence from sonority restrictions. Language Acquisition, 18(4), 281-293.
860
Berg, T. (1989). Intersegmental cohesiveness. Folia Linguistica, 23(3-4), 245–280.
861
Boersma, P., & Weenink, D. (2018). Praat: doing phonetics by computer. http://www.praat.org
862
Bombien, L., Mooshammer, C., Hoole, P., & Khnert, B. (2010). Prosodic and segmental effects
863
on EPG contact patterns of word-initial German clusters. Journal of Phonetics, 38(3), 388–403.
864
PRODUCTION OF INITIAL CONSONANT CLUSTERS
39
Bose, A., van Lieshout, P., & Square, P. a. (2007). Word frequency and bigram frequency effects
865
on linguistic processing and speech motor performance in individuals with aphasia and normal
866
speakers. Journal of Neurolinguistics, 20(1), 65–88.
867
868
Brendel, B., Ziegler, W., Erb, M., Riecker, A., & Ackermann, H. (2008). Does our Brain House
869
a “Mental Syllabary”? An fMRI Study. In Proceedings the 8th International Seminar on Speech
870
Production (ISSP), 73–76.
871
Broselow, E., & Finer, D. (1991). Parameter setting in second language phonology and syntax.
872
Second Language Research, 7(1), 35–59.
873
Browman, C. P., & Goldstein, L. (1986). Towards an articulatory phonology. Phonology
874
Yearbook, 3(1986), 219–252.
875
Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, J., & Böhl, A. (2011). The
876
word frequency effect. Experimental Psychology, 58(5), 412–24.
877
Bürki, A., Cheneval, P. P., & Laganaro, M. (2015). Do speakers have access to a mental syl-
878
labary? ERP comparison of high frequency and novel syllable production. Brain and Language,
879
150, 90–102.
880
Bybee, J. L. (1999). Usage-based phonology. In M. Darnell, E. A. Moravcsik, M. Noonan, F. J.
881
Newmeyer, & K. Wheatley (Eds.), Functionalism and formalism in linguistics (pp. 211–242).
882
John Benjamins.
883
Bybee, J. (2010). Language, usage and cognition. Cambridge University Press.
884
Bybee, J. L. (2010). Language, usage and cognition.
885
Byrd, D., & Choi, S. (2010). At the juncture of prosody, phonology, and phonetics — The
886
interaction of phrasal and syllable structure in shaping the timing of consonant gestures.
887
Laboratory Phonology, 10, 31–60.
888
Cholin, J., Levelt, W. J. M., & Schiller, N. O. (2006). Effects of syllable frequency in speech
889
production. Cognition, 99(2), 205–235.
890
Cholin, J., Schiller, N. O., & Levelt, W. J. (2004). The preparation of syllables in speech
891
production. Journal of Memory and Language, 50(1), 47-61.
892
Christman, S. S. (1994). Target-Related Neologism Formation in Jargonaphasia. Brain and
893
Language, 46, 109–128.
894
Clements, G. N. (1990). The role of the sonority cycle in core syllabification. Papers in
895
laboratory phonology, 1, 283–333.
896
PRODUCTION OF INITIAL CONSONANT CLUSTERS
40
Code, C., & Ball, M. J. (1994). Syllabification in aphasic recurring utterances: contributions of
897
sonority theory. Journal of Neurolinguistics, 8(4), 257–265.
898
Cohen-Goldberg, A. M. (2012). Phonological competition within the word: Evidence from the
899
phoneme similarity effect in spoken production. Journal of Memory and Language, 67(1), 184–
900
198.
901
Croot, K., & Rastle, K. (2004). Is there a syllabary containing stored articulatory plans for
902
speech production in English. Proceedings of the 10th Australian International Conference on
903
Speech Science and Technology, 376–381.
904
Dell, G. S., Burger, L. K., & Svec, W. R. (1997). Language production and serial order: A
905
functional analysis and a model. Psychological Review, 104(1), 123–147.
906
Dell, G. S., Juliano, C., & Govindjee, A. (1993). Structure and content in language production: A
907
theory of frame constraints in phonological speech errors. Cognitive Science, 17(2), 149–195.
908
Dziubalska-Kołaczyk, K. (2015). Are frequent, early and easy clusters also unmarked? Rivista di
909
Linguistica, 27(1), 29–43.
910
Edwards, J., Beckman, M. E., & Munson, B. (2004). Vocabulary size and phonotactic
911
production accuracy and fluency in nonword repetition. Journal of Speech, Language, and
912
Hearing Research, 47(2), 421–436.
913
elexiko (2003ff.), OWID – Online Wortschatz-Informationssystem Deutsch, Leibniz-Institut für
914
Deutsche Sprache (ed.), Mannheim, http://www.owid.de/wb/elexiko/start.html.
915
Frisch, S. A., & Wright, R. (2002). The phonetics of phonological speech errors: An acoustic
916
analysis of slips of the tongue. Journal of Phonetics, 30(2), 139–162.
917
Fromkin, V. A. (1971). The Non-Anomalous Nature of Anomalous Utterances. Language, 47(1),
918
27–52.
919
Fudge, E. C. (1969). Syllables. Journal of Linguistics, 5(2), 253–286.
920
Garnham, A., Shillcock, R. C., Brown, G. D., Mill, A. I., & Cutler, A. (1982). Slips of the tongue
921
in the London-Lund corpus of spontaneous conversation. Slips of the Tongue and Language
922
Production, 251–264.
923
Gaskell, M. G., & Marslen-Wilson, W. D. (1997). Integrating Form and Meaning: A Distributed
924
Model of Speech Perception. Language and Cognitive Processes, 12(5-6), 613–656.
925
Goad, H. (2011). The Representation of sC Clusters. The Blackwell Companion to Phonology,
926
1–26.
927
Goldrick, M. (2002). Pattern of sound, patterns in mind: Phonological regularities in speech
928
production (Doctoral dissertation). Johns Hopkins University.
929
PRODUCTION OF INITIAL CONSONANT CLUSTERS
41
Goldrick, M. (2004). Phonological features and phonotactic constraints in speech production.
930
Journal of Memory and Language, 51(4), 586–603.
931
Hanulikova, A., & Dietrich, R. (2008). Die variable Coda in der slowakisch-deutschen
932
Interimsprache. In M. Tarvas (Ed.), Tradition und Geschichte im literarischen und
933
sprachwissenschaftlichen Kontext (pp. 119-130). Bern: Peter Lang.
934
Harley, T. A., & Bown, H. E. (1998). What causes a tip-of-the-tongue state? Evidence for lexical
935
neighbourhood effects in speech production. British Journal of Psychology, 89(1), 151–174.
936
Harris, J. (1994). English Sound Structure. Blackwell.
937
Hay, J., Pierrehumbert, J., & Beckman, M. (2004). Speech perception, well-formedness and the
938
statistics of the lexicon. Papers in laboratory phonology VI, 58-74.
939
Hofmann, M. J., Stenneken, P., Conrad, M., & Jacobs, A. M. (2007). Sublexical frequency
940
measures for orthographic and phonological units in German. Behavior Research Methods, 39(3),
941
620–629.
942
Humphreys, K. R., Menzies, H., & Lake, J. K. (2010). Repeated speech errors: Evidence for
943
learning. Cognition, 117(2), 151–165.
944
Jescheniak, J. D., & Levelt, W. J. M. (1994). Word frequency effects in speech production:
945
Retrieval of syntactic information and of phonological form. Journal of Experimental
946
Psychology: Learning, Memory, and Cognition, 20(4), 824–843.
947
Kawamoto, A. H., & Kello, C. T. (1999). Effect of onset cluster complexity in speeded naming:
948
A test of rule-based approaches. Journal of Experimental Psychology: Human Perception and
949
Performance, 25(2), 361–375.
950
Kohn, S. E., Melvold, J., & Shipper, V. (1998). The preservation of sonority in the context of
951
impaired lexical-phonological output. Aphasiology, 12(4-5), 375–398.
952
Laganaro, M., & Alario, F. X. (2006). On the locus of the syllable frequency effect in speech
953
production. Journal of Memory and Language, 55(2), 178–196.
954
Leuninger, H. (1993). Reden ist Schweigen, Silber ist Gold. Gesammelte Versprecher (2.
955
Auflage). Deutscher Taschenbuch Verlag GmbH & Co. KG.
956
Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech
957
production. Behavioral and Brain Sciences, 22(1), 1–38, discussion 38–75.
958
Levelt, W. J. M., & Wheeldon, L. (1994). Do speakers have access to a mental syllabary?
959
Cognition, 50(1-3), 239–269.
960
PRODUCTION OF INITIAL CONSONANT CLUSTERS
42
Levitt, A. G., & Healy, A. F. (1985). The roles of phoneme frequency, similarity, and availa-
961
bility in the experimental elicitation of speech errors. Journal of Memory and Language, 24(6),
962
717–733.
963
MacKay, D. G. (1978). Speech errors inside the syllable. In A. Bell & J. B. Hooper (Eds.),
964
Syllables and segments (pp. 201–212). North-Holland.
965
Marian, V., Bartolotti, J., Chabal, S., & Shook, A. (2012). CLEARPOND: Cross-linguistic easy-
966
access resource for phonological and orthographic neighborhood densities. PLOS One 7(8), 1-
967
11.
968
Mathôt, S., Schreij, D., & Theeuwes, J. (2012). OpenSesame: An open-source graphical ex-
969
periment builder for the social sciences. Behavior Research Methods, 44(2), 314–324.
970
Meffert, E., Tillmanns, E., Heim, S., Jung, S., Huber, W., & Grande, M. (2011). Taboo: A Novel
971
Paradigm to Elicit Aphasia-Like Trouble-Indicating Behaviour in Normally Speaking
972
Individuals. Journal of Psycholinguistic Research, 40(5), 307–326.
973
Miozzo, M., & Buchwald, A. (2013). On the nature of sonority in spoken word production:
974
Evidence from neuropsychology. Cognition, 128(3), 287–301.
975
Mooshammer, C., Tiede, M., Katsika, A., & Goldstein, L. (2015). Effects of phonological
976
competition on speech planning and execution. In T. S. C. f. I. 2015 (Ed.), Proceedings of the
977
18th international congress of phonetic sciences (pp. 6–10). University of Glasgow.
978
Morelli, F. (1999). The phonotactics and phonology of obstruent clusters in Optimality Theory
979
(Doctoral dissertation).
980
Motley, M. T., & Baars, B. J. (1975). Encoding Sensitivities to Phonological Markedness and
981
Transitional Probability: Evidence From Spoonerisms. Human Communication Research, 1(4),
982
353–361.
983
Munson, B. (2000). Phonological pattern frequency and speech production in children and
984
adults. The Ohio State University.
985
Munson, B. (2001). Phonological Pattern Frequency and Speech Production in Children and
986
Adults. Journal of Speech, Language, and Hearing Research, 44(4), 778–792.
987
Nooteboom, S. G. (1973). The Tongue Slips into Patterns. In V. A. Fromkin (Ed.), Speech errors
988
as linguistic evidence (pp. 144–156). Mouton.
989
Ohala, D. K. (1999). The influence of sonority on children’s cluster reductions. Journal of
990
Communication Disorders, 32(6), 397–422.
991
Oldfield, R. C., & Wingfield, A. (1965). Response latencies in naming objects. Quarterly
992
Journal of Experimental Psychology, 17(4), 273–281.
993
PRODUCTION OF INITIAL CONSONANT CLUSTERS
43
O'Seaghdha, P. G., & Marin, J. W. (2000). Phonological competition and cooperation in form-
994
related priming: sequential and nonsequential processes in word production. Journal of
995
Experimental Psychology: Human perception and performance, 26(1), 57.
996
Pouplier, M., Marin, S., Hoole, P., & Kochetov, A. (2017). Speech rate effects in Russian onset
997
clusters are modulated by frequency, but not auditory cue robustness. Journal of Phonetics, 64,
998
108–126.
999
Pouplier, M., Marin, S., & Kochetov, A. (2017). The difficulty of articulatory complexity.
1000
Cognitive Neuropsychology, 34(7-8), 472–475.
1001
Pouplier, M., Pastätter, M., Hoole, P., Marin, S., Chitoran, I., Lentz, T. O., & Kochetov, A.
1002
(2022). Language and cluster-specific effects in the timing of onset consonant sequences in
1003
seven languages. Journal of Phonetics, 93, 1-28.
1004
R Core Team. (2016). R: A Language and Environment for Statistical Computing. R Foundation
1005
for Statistical Computing. https://www.r-project.org/
1006
Reason, J. T. (1992). Cognitive Underspecification. Its Variety and Consequences. In B. J. Baars
1007
(Ed.), Experimental slips and human error: Exploring the architecture of volition(pp. 71–91).
1008
Plenum Press.
1009
Reetz, H., & Jongman, A. (2009). Phonetics. Transcription, Production, Acoustics, and
1010
Perception. Wiley-Blackwell.
1011
Richtsmeier, P. (2011). Word-types, not word-tokens, facilitate extraction of phonotactic
1012
sequences by adults. Laboratory Phonology, 2(1), 157-183.
1013
Roelofs, A. (1997). The WEAVER model of word-form encoding in speech production.
1014
Cognition, 64(3), 249–284.
1015
Romani, C., & Calabrese, A. (1998). Syllabic constraints in the phonological errors of an aphasic
1016
patient. Brain and Language, 64(1), 83–121.
1017
Romani, C., Galluzzi, C., Bureca, I., & Olson, A. (2011). Effects of syllable structure in aphasic
1018
errors: Implications for a new model of speech production. Cognitive Psychology, 62(2), 151–
1019
192.
1020
Romani, C., Galluzzi, C., Goslin, J., Bureca, I., & Olson, A. (2013). Sonority, Frequency and
1021
Markedness in Errors of Aphasic Patients. Procedia - Social and Behavioral Sciences, 94(0), 55–
1022
56.
1023
Sadat, J., Martin, C. D., Costa, A., & Alario, F. X. (2014). Reconciling phonological
1024
neighborhood effects in speech production through single trial analysis. Cognitive Psychology,
1025
68, 33– 58.
1026
1027
PRODUCTION OF INITIAL CONSONANT CLUSTERS
44
Santiago, J., Pérez, E., Palma, A., & Stemberger, J. P. (2007). Syllable, Word, and Phoneme
1028
Frequency Effects in Spanish phonological speech errors: The David effect on the source of the
1029
error. MIT Working Papers in Linguistics, 53, 265–303.
1030
Selkirk, E. O. (1981). On the Nature of Phonological Representation. In T. Myers, J. D. M.
1031
Laver, & J. Anderson (Eds.), The cognitive representation of speech (pp. 379–388). North-
1032
Holland Publishing Company.
1033
Selkirk, E. O. (1982). The syllable. In H. van der Hulst & N. Smith (Eds.), The structure of
1034
phonological representations (part ii) (pp. 337–383). Foris Publications.
1035
Shattuck-Hufnagel, S., & Klatt, D. H. (1979). The limited use of distinctive features and
1036
markedness in speech production: evidence from speech error data. Journal of Verbal Learning
1037
and Verbal Behavior, 18(1), 41–55.
1038
Shuster, L. I. (2009). The effect of sublexical and lexical frequency on speech production: An
1039
fMRI investigation. Brain and Language, 111(1), 66–72.
1040
Sievers, E. (1897). Grundzüge der Lautphysiologie zur Einführung in das Studium der Lautlehre
1041
der indogermanischen Sprachen. Breitkopf & Härtel.
1042
Smith, J. L., & Moreton, E. (2012). Sonority variation in Stochastic Optimality Theory:
1043
Implications for markedness hierarchies. In S. Parker (Ed.), The sonority controversy, 167-194.
1044
Stemberger, J. P. (1991). Apparent anti-frequency effects in language production: The addition
1045
bias and phonological underspecification. Journal of Memory and Language, 30, 151– 185.
1046
Stemberger, J. P. (2004). Neighbourhood effects on error rates in speech production. Brain and
1047
Language, 90(1-3), 413–422.
1048
Stenneken, P., Bastiaanse, R., Huber, W., & Jacobs, A. M. (2005). Syllable structure and sonority
1049
in language inventory and aphasic neologisms. Brain and Language, 95(2), 280–292.
1050
Treiman, R. (1986). The Division between Onsets and Rimes in English. Journal of Memory and
1051
Language, 25(4), 476–491.
1052
Tzakosta, M. (2009). Asymmetries in /s/ cluster production and their implications for language
1053
learning and language teaching. 18th ISTAL, (2000), 335–373.
1054
Ulbrich, C., Alday, P. M., Knaus, J., Orzechowska, P., & Wiese, R. (2016). The role of
1055
phonotactic principles in language processing. Language, Cognition and Neuroscience, 31(5),
1056
662– 682.
1057
van Casteren, M., & Davis, M. H. (2006). Mix, a program for pseudorandomization. Behavior
1058
research methods, 38(4), 584–589.
1059
1060
PRODUCTION OF INITIAL CONSONANT CLUSTERS
45
Vitevitch, M. S. (2002). Naturalistic and Experimental Analyses of Word Frequency and
1061
Neighborhood Density Effects in Slips of the Ear. Language and Speech, 45(4), 407–434.
1062
Vitevitch, M. S. (2003). The influence of sublexical and lexical representations on the processing
1063
of spoken words in English. Clinical Linguistics & Phonetics, 17(6), 487–499.
1064
Vitevitch, M. S. (2007). The spread of the phonological neighborhood influences spoken word
1065
recognition. Memory and Cognition, 35(1), 166–175.
1066
Vitevitch, M. S., & Luce, P. A. (1998). When Words Compete: Levels of Processing in
1067
Perception of Spoken Words. Psychological Science, 9(4), 325–329.
1068
Vitevitch, M. S., & Stamer, M. (2006). The curious case of competition in Spanish speech
1069
production. Language and Cognitive Processes, 21(6), 760–770.
1070
Vousden, J. I., & Maylor, E. (2006). Speech errors across the lifespan. Language and Cognitive
1071
Processes, 21(1-3), 48–77.
1072
Wade, T., Dogil, G., Schütze, H., Walsh, M., & Möbius, B. (2010). Syllable frequency effects in
1073
a context-sensitive segment production model. Journal of Phonetics, 38, 227–239.
1074
Wiese, R. (2000). The phonology of German. Oxford University Press on Demand.
1075
Wilshire, C. E. (1998). Serial order in phonological encoding: an exploration of the ‘word onset
1076
effect’ using laboratory-induced errors. Cognition, 68(2), 143–66.
1077
Wulfert, S., Auer, P., & Hanulíková, A. (manuscript submitted for publication). Frequency of
1078
use and sonority sequencing in consonant cluster perception: Facilitation is language-specific.
1079
Yavaş, M. (2003). Role of sonority in developing phonologies. Journal of Multilingual
1080
Communication Disorders, 1(2), 79–98.
1081
Ziegler, W., Aichert, I., & Staiger, A. (2017). When words don׳ t come easily: A latent trait
1082
analysis of impaired speech motor planning in patients with apraxia of speech. Journal of
1083
Phonetics, 64, 145-155.
1084
PRODUCTION OF INITIAL CONSONANT CLUSTERS
46
1085
Figure captions
1086
Figure 1
1087
Overview over consonant clusters used in the study showing their log frequency
1088
distribution and sonority distance
1089
1090
Figure 2
1091
Error rates over individual target clusters (in descending order of frequency)
1092
1093
Figure 3
1094
Error rates over clusters grouped by pair
1095
1096
Figure 4
1097
Effect of log type cluster frequency
1098
1099
Figure 5
1100
Effect of sonority distance
1101
1102
Figure 6
1103
False positive rates of the test clusters pooled across all target clusters.
1104
1105
1106
1107
1108