Conference PaperPDF Available

Acoustics of Spanish and English coronal stops


Abstract and Figures

This study explores the acoustic correlates that distinguish coronal stops (/t/, /d/) between English and Spanish. English and Spanish coronal stops are hypothesized to differ in terms of voice-onset time and place of articulation. We are particularly concerned with capturing the place of articulation difference with acoustic data, as the voice-onset time difference is well known. Specifically, we focus on English /d/ and Spanish /t/, which are phonetically-voiceless stops with a short-lag voice-onset time. Spanish /t/ has been described as being articulated at dental place, whereas English /d/ is articulated at alveolar place. Mixed-effects models explored various spectral measurements of the consonant burst and found that standard deviation, relative burst intensity, and center of gravity differed as a function of place of articulation (or language).
Content may be subject to copyright.
Joseph V. Casillas, Yamile Díaz and Miquel Simonet
University of Arizona, Tucson
{jvcasill, ydiaz44, simonet}
This study explores the acoustic correlates that dis-
tinguish coronal stops (/t/, /d/) between English and
Spanish. English and Spanish coronal stops are hy-
pothesized to differ in terms of voice-onset time and
place of articulation. We are particularly concerned
with capturing the place of articulation difference
with acoustic data, as the voice-onset time difference
is well known. Specifically, we focus on English
/d/ and Spanish /t/, which are phonetically-voiceless
stops with a short-lag voice-onset time. Spanish
/t/ has been described as being articulated at dental
place, whereas English /d/ is articulated at alveolar
place. Mixed-effects models explored various spec-
tral measurements of the consonant burst and found
that standard deviation, relative burst intensity, and
center of gravity differed as a function of place of
articulation (or language).
Keywords: Coronal stops, Spectral moments, VOT,
Spanish, English
Spanish and English both have coronal stops (/d, t/);
however, their phonetic implementation differs. En-
glish /d/ and /t/ are produced with an alveolar place
of articulation (POA) [15]. Spanish /d/ and /t/, on
the other hand, are both produced with a dental POA
[9]. These descriptions rest mostly on impressionis-
tic observations. This investigation sets out to ex-
plore the acoustic correlates related to place differ-
ences amongst these segments.
An important difference between Spanish and En-
glish has to do with their treatment of the stop
voicing distinction common to these languages.
Both distinguish between /t/ and /d/ by exploiting
the acoustic correlate voice-onset time (VOT)—the
acoustic output of the coordination of glottal and
supra-glottal gestures that results in a time differ-
ence between the onset of modal voicing and articu-
latory release—; however, the manner in which they
exploit VOT differs between the two languages. In
English, /d/ has a short-lag VOT and /t/ has a long-
lag VOT, whereas in Spanish /d/ has a lead VOT
(prevoicing) and /t/ has a short-lag VOT [13, 20].
The question of how bilinguals who speak Span-
ish and English use VOT to distinguish voiced and
voiceless stops in their two languages has been in-
vestigated at length [19, 20]; however, studies to
date have overlooked the fact that bilinguals would
need to produce a difference in POA, in addition to
differently exploiting VOT, in order to avoid coa-
lescing English and Spanish coronal stops. To date,
few analyses have used acoustic measures to inves-
tigate differences in POA of coronal stops; two ex-
ceptions are [17] and [18].
The fact that both Spanish and English have short-
lag, phonetically voiceless stops in their sound in-
ventory (/t/ and /d/, respectively) begs the ques-
tion as to whether these two segments can be dis-
tinguished by any acoustic measures. Accounting
for POA differences via acoustic metrics opens the
door for new areas of research regarding bilingual-
ism and second-language learning. Similarly to
[17], the goal of the present study is to try to cap-
ture the hypothesized place difference between En-
glish and Spanish with acoustic data. The present
study focuses on monolingual speakers of both lan-
guages. Our future goals include studying the be-
havior of Spanish-English bilinguals to determine
whether they exploit the place differences between
Spanish and English coronal stops, similarly to [18].
The first four spectral moments—center of grav-
ity (COG), standard deviation (SD), skewness (SK),
and kurtosis (KT)—provide acoustic measurements
related to the shape of a spectrum (i.e. how the
energy is distributed across frequency bands) [10].
Various investigators have used spectral moments to
distinguish between place differences in fricatives
[7, 11]; however, [17] is one of a reduced number of
studies to use spectral moments in order to analyze
place differences in stops. Specifically, Sundara ex-
amined coronal stops in French and English, which
(similarly to Spanish and English) are realized with
dental and alveolar place, respectively. Her investi-
gation found differences between the two languages
in relative burst intensity, COG, SD, and KT that
were triggered by differences in POA. It remains
an open question whether place differences between
Spanish and English can be accounted for in the
same manner.
The goal of this investigation was to explore the
acoustic correlates that differentiate Spanish from
English coronal stops. We measured VOT, the first
four spectral moments, and relative burst intensity
(see below). After establishing the expected differ-
ences in terms of VOT, we focused on an analysis of
only the two short-lag stops (English /d/ and Span-
ish /t/). Of particular interest was the relative impor-
tance of each of these spectral measures with regard
to POA differences across the two languages.
2.1. Speakers
In order to address the aforementioned issues, we
recorded the speech of 16 female participants. Eight
were native Spanish speakers between the ages of
18 and 23, all of which were recruited from the
Universitat de les Illes Balears campus community
and were born and raised on the island of Majorca,
Spain. Eight were native English speakers and were
undergraduate students at the University of Arizona,
born and raised in the US Southwest. The Spanish
speakers had studied some English in Spain, and the
English speakers had studied some Spanish in the
U.S., but none of the speakers were able to maintain
a basic conversation in their “second” language.
2.2. Materials and Procedure
We devised a list of 48 target words, 24 in English
and 24 in Spanish. The target words contained the
voiced and voiceless coronal stops of both languages
in word initial position. For each language there was
a total of 24 words, 12 beginning with /d/ and 12
beginning with /t/, equally divided between stressed
and unstressed syllables. All stops were followed by
a low vowel (/a/ for Spanish and /æ, A/ for English).
(See [17].)
In order to collect the acoustic data we used
the “delayed repetition technique” widely used in
bilingual-speech research [5]. The materials were
read by 6 male native speakers of these languages: 3
native English speakers (recorded in Austin, Texas)
and 3 native Spanish speakers (recorded on Majorca,
Spain). These acoustic materials were used as audi-
tory stimuli to be repeated outloud by the 16 female
speakers whose speech is analyzed here.
The speakers produced the target words in the car-
rier phrase “_ is the word” or the Spanish equivalent
(“_ es la palabra”). All words not containing coro-
nal stops were considered distractors. The computer
program Praat [3] presented the sentences randomly
in auditory form and the speakers were asked to lis-
ten to the entire sentence and then repeat it outloud
after a beep at their own pace. They were not asked
to imitate the voices of the male talkers, but to pro-
duce the sentences in their “own way.”
The English data were recorded in a sound atten-
uated booth on the campus of the University of Ari-
zona. The Spanish data were obtained in a quiet
classroom on the campus of the Universitat de les
Illes Balears. In order to carry out the recordings
we used a Shure SM10A dynamic head-mounted
microphone, a Sound Devices MM-1 microphone
pre-amplifier and a Marantz PMD660 digital speech
recorder. The signal was digitized at 44.1 kHz and
16-bit quantization.
Each participant provided the dataset with 72
coronal stops (24 target words ×3 repetitions).
Thus, a total of 1,152 tokens were recorded (24
words ×3 repetitions ×16 participants = 1,152
stops). Our initial analysis of VOT utilizes the entire
dataset; however, for subsequent analyses of burst
measurements we took a subset of this data (exactly
half) containing only Spanish /t/ and English /d/,
as these are the stops that are not easily be distin-
guished by VOT. Five tokens were removed due to
mispronunciations or extraneous noise leaving a to-
tal of 571 tokens for burst analyses.
2.3. Measurements
The digitized sound files were low-pass filtered at
11.025 kHz. For each of the coronal stops, synchro-
nized waveform and spectrographic displays were
used to mark the onset of modal voicing and of the
burst. The onset of voicing was taken to be the
upwards zero-crossing of the first periodic pattern
found in the oscillogram [12]. Voice-onset time was
calculated as the difference (in ms) between the on-
set of modal voicing and the onset of the burst.
Unlike in [17], the duration of the burst was de-
termined semi-automatically. For short-lag stops,
the burst was equal to the duration of VOT (see
above). Thus, in cases in which VOT was positive
but smaller than 25 ms, the burst was variable. For
stops with long-lag and lead VOT, the burst was ex-
actly 25 ms. That is, the onset of the burst was deter-
mined by hand, but the offset of the burst was set to
occur 25 ms after the onset—i.e., the longest bursts
were 25 ms.
In order to calculate relative burst intensity (RI),
we extracted the intensity of the burst (in dB), as
well as the mean intensity of the following vowel,
which was also manually segmented. RI was the dif-
ference between the intensity of the vowel and the
intensity of the burst. All spectral measures (COG,
SD, SK, KT) were derived from the spectral enve-
lope (which ranged from 60 Hz to 11.025 kHz.). To-
kens for which a clear burst could not be established
were removed.
2.4. Statistical analyses
A linear mixed-effects model was fit to each acous-
tic measure detailed above (i.e. VOT, RI, COG,
SD, SK, and KT). The analysis of VOT included
the entire dataset. Language (Spanish, English) and
consonant (/d/, /t/) were fixed effects. Individual
speaker and word items were random effects [1],
with random slopes for subjects and items for the ef-
fect consonant [2]. Statistical significance of group,
consonant, and the group by consonant interaction
were assessed using hierarchical partitioning of vari-
ance via nested model comparisons. Simultaneous
Tests for General Linear Hypotheses analyzed all
pairwise comparisons using Tukey Contrasts with
adjusted p-levels.
Subsequent analyses of residualized burst mea-
surements only included Spanish /t/ and English /d/
data. The present study was concerned with an-
alyzing the acoustic correlates (aside from VOT)
that could account for POA differences. Because
all burst metrics are directly related to VOT, the
effect of this variable was partialed out of the
burst measurements as a method of reducing multi-
collinearity between predictors. In order to accom-
plish this, separate models were fit with each burst
measurement regressed on VOT. The residuals of
these models were then used as the predictors for all
analyses.1Thus, each omnibus model directly com-
pared Spanish /t/ to English /d/.2In each case, in-
dividual speaker and word items were given random
intercepts. We report marginal R2and conditional
R2as an indication of goodness of fit for all models
[14]. Marginal R2provides a measure of variance
explained without mixed-effects and conditional R2
includes them.
The second analysis explored the extent to which
each of the acoustic measures could provide use-
ful information about the POA of the phonetically
voiceless segments. To this end, we divided the
dataset into two subsets of Spanish /t/ and English
/d/ stops: a training set, comprised of 75% of the
data, and a testing set, comprised of 25% of the data.
We then used RI, COG, SD, SK and KT as predic-
tors in a forward selection logistic regression model
in which phoneme identity (Spanish /t/, English /d/)
was the criterion variable. Causal priority was given
to the correlates found to best predict POA in [17].
After building the model on the training subset, we
used cross-validation to predict the identity of the
stops in the testing subset.
3.1. VOT
The analysis of the VOT data revealed a main ef-
fect of language (χ(2) = 44.07; p < 0.001), conso-
nant (χ(2) = 56.39; p < 0.001), as well as a lan-
guage by consonant interaction (χ(1) = 10.54; p <
0.002). Pairwise comparisons showed that all of the
coronal stops differed from each other (p < 0.001),
with the exception of the Spanish /t/ vs. English
/d/ short-lag stops. The mixed model provided the
best fit for the data (conditional R2= 0.87; marginal
R2= 0.82). Figure 1A plots VOT as a function of
language and consonant. The light gray box shows
that Spanish /d/ was produced with lead VOT ( ¯x=
64.48 ±28.56 SD), and the dark gray box shows
that English /t/ was produced with long-lag VOT
( ¯x=77.63 ±25.63 SD). The white boxes represent
the short-lag stops of English and Spanish. VOT for
English /d/ was slightly longer ( ¯x=22.13 ±26.69
SD) than Spanish /t/ ( ¯x=16.18 ±5.08 SD); how-
ever, this difference was negligible, likely due to the
high rate of variability for English /t/. Thus, in these
data VOT can account for differences between all
coronal stops except for those that are manifested
through short-lag VOT: English /d/ and Spanish /t/.
3.2. Burst measurements
Figure 1B plots RI of Spanish /t/ and English /d/.
The data were best fit using the mixed-effects model
(conditional R2= 0.66; marginal R2= 0.20). The
analysis revealed that English /d/ was 4.81 dB ±1.77
standard errors (SE) higher that Spanish /t/ (t = -
2.72; p < 0.02).
The COG data were also best fit using the mixed-
effects model (conditional R2= 0.68; marginal R2
= 0.19). Spanish /t/ was 1131 Hz ±455 SE lower
than the average English /d/ (t = -2.48, p < 0.03; see
Figure 1C). Regarding SD, the mixed-effect model
accounted for 61% of the variance (vs. marginal R2
of 26%). The SD values for Spanish /t/ were 763
Hz ±222 SE lower than English /d/ (t = -3.44, p
< 0.004; see Figure 1D). The analysis of SK (Fig-
ure 1E) revealed that Spanish /t/ was 2.10 ±0.87 SE
units higher than English /d/ (t = 2.40; p < 0.03).
Again, the data were best fit using the mixed ef-
fects model (conditional R2= 0.45; marginal R2=
0.12). Lastly, the KT data had the least amount of
variance explained by the model (conditional R2=
0.26; marginal R2= 0.06). Spanish /t/ was 29.02 ±
14.30 se units higher than English /d/; however, this
difference was not significant at our specified alpha
level (t = 2.03, p = 0.06; see Figure 1F). In sum,
Figure 1: VOT and burst measures of Spanish and English coronals.
-150 -100 -50 0 50 100 150
English Spanish
-15 -10 -5 0 5 10 15 20
Relative Intensity
English Spanish
-2000 -1000 0 1000 2000 3000 4000 5000
Center of Gravity
English Spanish
-1000 0 1000 2000
Standard Deviation
English Spanish
-5 0 5 10
English Spanish
-100 0 100 200 300 400
English Spanish
all of the burst measurements, with the exception of
kurtosis, differed as a function of language. This is
taken as an indication that these metrics successfully
accounted for place differences between the short-
lag stops of English and Spanish.
The next step was to analyze the relative con-
tribution of the burst measurements. The training
subset of the data was analyzed via logistic regres-
sion, with the burst metrics as fixed effects for pre-
dicting the short-lag stop phoneme identity (Spanish
/t/, English /d/). The model eliminated SK and KT
from the analysis. Table 1 summarizes the results.
Nagelkerkes’ pseudo R2is reported to give an indi-
cation of goodness-of-fit of each predictor as it was
entered into the model. SD accounted for the largest
amount of the variance (35%), followed by RI (5%)
and COG (2%).
Table 1: Regression analysis of /d/-/t/.
Metric R2R2
change χ2
change p-value
SD .353 .353 175.39 < 0.001
COG .376 .023 14.01 < 0.001
RI .425 .049 30.01 < 0.001
SK .426 .001 0.08 > 0.05
KT .432 .006 4.36 < 0.04
Finally, the best fit model was used to predict the
identity of the short-lag stops of the testing subset.
The model performed with 87% accuracy (out of
sample error = 0.13). That is, given the data for SD,
RI and COG, it was possible to predict whether the
stop was Spanish /t/ or English /d/ on 87% of the
testing subset (142 tokens).
The results of the analyses showed that the phoneti-
cally voiceless coronal stops of English and Spanish
can be distinguished by RI and by the spectral shape
of the stop burst. Importantly, the present study in-
dicates that the place of articulation differences de-
scribed for Spanish /t/ and English /d/ are best ac-
counted for using measures of SD, RI, and COG. SK
and KT did not significantly contribute to predicting
the POA of the short-lag stops in our data.
Our results partially corroborate the findings of
[17]. As is the case in the Canadian varieties of En-
glish and French investigated in [17], the two vari-
eties of English and Spanish investigated here dif-
fer in place of articulation of their coronal stops—
alveolar (English) and dental (French, Spanish), ac-
cording to impressionistic descriptions. In our data,
similar to [17], values of SD and other burst mea-
sures varied across the two languages; however, dif-
ferent from our findings, she also encountered sig-
nificant differences for kurtosis.
The present study contributes language-specific
acoustic characteristics of bursts in the short-lag
coronal stops of two monolingual varieties of En-
glish and Spanish. Among other things, the findings
provide base acoustic descriptions for future studies
on Spanish-English bilinguals.
[1] Baayen, R. H., Davidson, D. J., Bates, D. M. Nov.
2008. Mixed-effects modeling with crossed ran-
dom effects for subjects and items. Journal of
Memory and Language 59, 390–412.
[2] Barr, D. J., Levy, R., Scheepers, C., Tily, H. J. Apr.
2013. Random effects structure for confirmatory
hypothesis testing: Keep it maximal. Journal of
Memory and Language 68, 255–278.
[3] Boersma, P., Weenink, D. 2012. Praat: doing pho-
netics by computer. Glot International 5, 341–345.
[4] Cole, J., McMurray, B., Munson, C., Linebaugh,
G. 2010. Unmasking the acoustic effects of vowel-
to-vowel coarticulation: A statistical modeling ap-
proach. Journal of Phonetics 38, 167–184.
[5] Flege, J. E., Munro, M., MacKay, I. 1995. Effects
of age of second-language learning on the produc-
tion of English consonants. Speech Communication
16, 1–26.
[6] Fowler, C. A. 1984. Segmentation of coarticulated
speech in perception. Perception & Psychophysics
36, 359–368.
[7] Gordon, M., Barthmaier, P., Sands, K. 2002. A
cross-linguistic acoustic study of voiceless frica-
tives. Journal of the International Phonetic Associ-
ation 32, 141–174.
[8] Gow, D. W. 2003. Feature parsing: Feature cue
mapping in spoken word recognition. Perception
& Psychophysics 65, 575–590.
[9] Hualde, J. I. 2005. The sounds of Spanish. Cam-
bridge University Press.
[10] Jones, M. J., Knight, R.-A. 2013. Bloomsbury
Companion to Phonetics. London, UK: A & C
[11] Jongman, A., Wayland, R., Wong, S. 2000. Acous-
tic characteristics of English fricatives. The Journal
of the Acoustical Society of America 108, 1252–
[12] Lieberman, P., Blumstein, S. E. 1988. Speech phys-
iology, speech perception, and acoustic phonetics.
Cambridge, UK: Cambridge University Press.
[13] Lisker, L., Abramson, A. S. 1964. A cross-
language study of voicing in initial stops: Acous-
tical measurements. Word 20, 384–422.
[14] Nakagawa, S., Schielzeth, H. 2013. A general and
simple method for obtaining R2 from generalized
linear mixed-effects models. Methods in Ecology
and Evolution 4, 133–142.
[15] Picard, M. 1987. An introduction to the compar-
ative phonetics of English and French in North
America volume 7. Amsterdam: The Nethrlands:
John Benjamins Publishing.
[16] R Core Team, 2014. R: A Language and Environ-
ment for Statistical Computing. R Foundation for
Statistical Computing Vienna, Austria.
[17] Sundara, M. 2005. Acoustic-phonetics of coronal
stops: A cross-language study of Canadian English
and Canadian French. The Journal of the Acousti-
cal Society of America 118, 1026–10037.
[18] Sundara, M., Polka, L., Baum, S. 2006. Production
of coronal stops by adult simultaneous bilinguals.
Bilingualism: Language and Cognition 9, 97–114.
[19] Williams, L. 1977. The perception of stop conso-
nant voicing by Spanish-English bilinguals. Per-
ception & Psychophysics 21, 289–297.
[20] Williams, L. 1979. The modification of speech per-
ception and production in second-language learn-
ing. Perception & Psychophysics 26, 95–104.
1See [6], [8] and [4] for discussion and examples of this
2Degrees of freedom for hypothesis tests were derived
using the Satterthwaite approximation as implemented in
the lmerTest package in R [16].
... One notable issue regarding voice timing that bares on this line of research is related to the mea-sure of VOT itself. Manifold studies show that VOT is modulated by linguistic factors, such as place of articulation (Cho and Ladefog 1999), word position (Antoniou et al. 2010), lexical stress (Casillas et al. 2015), and speech rate (Magloire and Green 1999) in monolingual and bilingual speech. For instance, faster speech is associated with shorter VOT and slower speech is associated with longer VOT, though the size of the effect may be language specific. ...
... Importantly, the coronal stops of each language also differ regarding place of articulation. In Spanish coronal stops are described as dental, whereas in English they are described as alveolar (Casillas et al. 2015). ...
... Higher kurtosis values are found in Spanish monolingual coronals with regard to English monolingual coronals. This difference reflects the place of articulation differences between Spanish (dental) and English (alveolar) coronals (see Casillas et al. 2015;Sundara et al. 2006). The plots can be interpreted using the quadrants specified by the vertical and horizontal dotted lines. ...
Full-text available
Previous studies attest that some early bilinguals produce the sounds of their languages in a manner that is characterized as “compromise” with regard to monolingual speakers. The present study uses meta-analytic techniques and coronal stop data from early bilinguals in order to assess this claim. The goal was to evaluate the cumulative evidence for “compromise” voice-onset time (VOT) in the speech of early bilinguals by providing a comprehensive assessment of the literature and presenting an acoustic analysis of coronal stops from early Spanish–English bilinguals. The studies were coded for linguistic and methodological features, as well as effect sizes, and then analyzed using a cross-classified Bayesian meta-analysis. The pooled effect for “compromise” VOT was negligible (β = −0.13). The acoustic analysis of the coronal stop data showed that the early Spanish–English bilinguals often produced Spanish and English targets with mismatched features from their other language. These performance mismatches presumably occurred as a result of interlingual interactions elicited by the experimental task. Taken together, the results suggest that early bilinguals do not have “compromise” VOT, though their speech involves dynamic phonetic interactions that can surface as performance mismatches during speech production.
... All speech stimuli were 210 ms in duration with a 10 ms burst, 30 ms formant transition and 115 ms of steady-state (vowel). Since the place of articulation for coronal stops in English (i.e., alveolar) and Spanish (i.e., dental) is discriminated differently based on age of second language acquisition (Casillas, Díaz & Simonet, 2015;Sundara, Polka & Baum, 2006;Sundara & Polka, 2008), we kept the burst properties consistent across all stimuli. We also kept the vowel properties consistent to isolate VOT as the only perceptual cue that differed across stimuli. ...
Speech perception involves both conceptual cues and perceptual cues. These, individually, have been shown to guide bilinguals’ speech perception; but their potential interaction has been ignored. Explicitly, bilinguals have been given perceptual cues that could be predicted by the conceptual cues. Therefore, to target the perceptual-conceptual interaction, we created a restricted range of perceptual cues that either matched, or mismatched, bilinguals’ conceptual predictions based on the language context. Specifically, we designed an active speech perception task that concurrently collected electrophysiological data from Spanish–English bilinguals and English monolinguals to address the extent to which this cue interaction uniquely affects bilinguals’ speech sound perception and allocation of attentional resources. Bilinguals’ larger MMN-N2b in the mismatched context aligns with the Predictive Coding Hypothesis to suggest that bilinguals use their diverse perceptual routines to best allocate cognitive resources to perceive speech.
... However, the coronal stops in these two languages differ not only in terms of VOT, but also in place of articulation (Casillas, Díaz & Simonet, 2015), as /d/ and /t/ are alveolar in English while they are dental (i.e., d t ) in both Spanish and Catalan. Therefore, the English /d t/ ...
Full-text available
Learning a foreign language (L2) in an instructional setting is characterized by limited exposure to the target language. This scenario might be problematic for accurate L2 language learning, since authentic input is necessary to enhance L2 learning. Against this background, a possible source of target language experience and immediate corrective feedback can be found in L2 phonetic training, as it provides learners with native input and may focus on particularly challenging L2 structures. This study compares the effect of two high variability phonetic training (HVPT) methods on specifically attended sounds and on implicitly exposed but unattended sounds. Several training regimes are implemented aimed at improving the perception and production of a subset of English vowels (/i ɪ æ ʌ ɜː/) and initial and final stops by Spanish/Catalan bilingual learners of English. Thus this study addresses the following questions: (a) whether training can improve the perception and production of trained as well as untrained segments, (b) whether improvement generalizes to novel stimuli and talkers, (c) if improvement is retained over time, (d) which training method (Identification (ID) or categorical Discrimination (DIS)) is more effective, and (e) what are the participants‟ impressions of phonetic training as a L2 training tool. A total of 100 bilingual Catalan/Spanish learners of English were divided into four experimental groups and a control group and were tested on their identification of English sounds presented in CVC non-words before and after a five-week training period, and two months later. L2 production was assessed before and immediately after training through a picture naming task and analysed by means of native speaker judgments. The trained groups differed either in terms of training method (ID, DIS) or focus of training (consonants, vowels), resulting in four different groups. Crucially, all four groups were trained with the same sets of CVC non-words (e.g. zat, zut, zad, zud), exposing learners to attended contrasts within trials and to unattended contrasts across trials. The results reveal that all experimental groups significantly outperform the controls in their identification of trained sounds (vowels and initial stops), showing the efficacy of both phonetic training methodologies (ID and categorical AX DIS). However, while both experimental groups perform similarly when modifying initial stop perception, the ID trainees outperform the DIS trainees on trained vowel perception. These results suggest that modifying the perception of different types of segments might require different training procedures and amounts of training time. Interestingly, only the DIS trainees show a significant improvement in the perception of untrained/unattended L2 sounds, indicating that this training method may be more suited to enhancing learners‟ perception of attended as well as unattended target sounds. Regarding generalization and retention, the results point to the superiority of the ID task over a categorical DIS task when training vowel sounds. Moreover, the results indicate that both methods are well suited to training initial consonants to the same extent. With respect to production, only the vowel ID trainees significantly improve their production of trained sounds, which shows that pronunciation improvement might take place as a result of an identification perceptual training regime, even in the absence of production training. Finally, students‟ opinions of phonetic training as an EFL tool are positive overall and ID is favoured over DIS as a training method. Globally, these findings suggest that while both methods are effective for training L2 perception, ID and DIS methods may promote improvement, generalization and retention for vowels and for consonants to different degrees. The better results obtained with ID training, particularly for vowels, and the fact that only DIS promoted improvement with untrained sounds (cross-training effects) may be related to the nature and focus of the tasks and/or to the acoustic characteristics of the target sounds. These results may have implications for future research on phonetic training and practical applications in the teaching of L2 pronunciation.
... Esta composición de uso convierte al VOT en una medida de referencia interesante para el ejercicio de la clínica en habla (CH), sin embargo, su comportamiento varia en función de las relaciones fonológicas y coarticulatorias propias de cada lengua (9), (10), (11), (12) , es por ello que se requieren medidas de referencias ajustadas a las manifestaciones sociales, culturales y demográficas del grupo de hablantes en medio el cual se desarrollan las prácticas de CH. ...
Full-text available
INTRODUCCIÓN: El objetivo del trabajo es construir medidas de referencia para el VOT del español hablado en la zona nororiental de Colombia a partir de una población de estudiantes universitarios. MÉTODOS: Estudio descriptivo transversal, la selección de la población fue aleatoria a partir de muestreo estratificado. El numero total de participantes fue de 35, 17 mujeres y 18 hombres. Las muestras acústicas se tomaron en una cabina sonamortiguada usando un micrófono unidireccional. Los datos obtenidos recibieron análisis de tendencia central y de correlación canónica. RESULTADOS: Las medias de VOT para las oclusivas sordas en la población fueron: (1) /p/ 15,70 s.; (2) /t/ 15,56 s.; (3) /k/ 30,38 s. ANÁLISIS Y DISCUSIÓN: No se encontró relación estadística significativa entres VOT con edad, género o lugar de procedencia. El VOT cambi de forma significativa en presencia de los sonidos /i/, /e/ en razón al efecto coarticulador de estos sonidos. CONCLUSIONES: Es necesario ampliar los participantes del estudio a fin de indagar sobre los efectos de la varión regional del español colombiano sobre VOT. Se recomienda utilizar como estrategia de normalización del VOT en futuros estudios una toma triple para cada ingreso de datos.
Full-text available
Linear mixed-effects models (LMEMs) have become increasingly prominent in psycholinguistics and related areas. However, many researchers do not seem to appreciate how random effects structures affect the generalizability of an analysis. Here, we argue that researchers using LMEMs for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades. Through theoretical arguments and Monte Carlo simulation, we show that LMEMs generalize best when they include the maximal random effects structure justified by the design. The generalization performance of LMEMs including data-driven random effects structures strongly depends upon modeling criteria and sample size, yielding reasonable results on moderately-sized samples when conservative criteria are used, but with little or no power advantage over maximal models. Finally, random-intercepts-only LMEMs used on within-subjects and/or within-items data from populations where subjects and/or items vary in their sensitivity to experimental manipulations always generalize worse than separate F1 and F2 tests, and in many cases, even worse than F1 alone. Maximal LMEMs should be the ‘gold standard’ for confirmatory hypothesis testing in psycholinguistics and beyond.
Full-text available
This study investigated acoustic-phonetics of coronal stop production by adult simultaneous bilingual and monolingual speakers of Canadian English (CE) and Canadian French (CF). Differences in the phonetics of CF and CE include voicing and place of articulation distinctions. CE has a two-way voicing distinction (in syllable initial position) contrasting short-and long-lag VOT; coronal stops in CE are described as alveolar. CF also has a two-way voicing distinction, but contrasting lead and short-lag VOT; coronal stops in CF are described as dental. Acoustic analyses of stop consonants for both VOT and dental/alveolar place of articulation are reported. Results indicate that simultaneous bilingual as well as monolingual adults produce language-specific differences, albeit not in the same way, across CF and CE for voicing and place. Similarities and differences between simultaneous bilingual and monolingual adults are discussed to address phonological organization in simultaneous bilingual adults.
Full-text available
This study examined the production of English consonants by native speakers of Italian. The 240 adult native Italian speakers of English who participated had begun learning English when they emigrated to Canada between the ages of 2 and 23 years. Word-initial, word-medial and word-final tokens of English stops and fricatives were assessed through forced-choice judgments made by native English-speaking listeners, and acoustically. The native Italian subjects' ages of learning (AOL) English exerted a systematic effect on their production of English consonants even though they had lived in Canada for an average of 32 years, and reported speaking English more than Italian. In all but two instances, one or more native Italian subgroup defined on the basis of AOL differed significantly from subjects in a native English (NE) control group. The AOL of the first native Italian subgroup to differ from the NE subjects varied across consonant and syllable position. The results are discussed in terms of hypotheses proposed in the literature concerning the basis of segmental errors in L2 speech production.
Linear mixed-effects models (LMEMs) have become increasingly prominent in psycholin-guistics and related areas. However, many researchers do not seem to appreciate how random effects structures affect the generalizability of an analysis. Here, we argue that researchers using LMEMs for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades. Through theoretical arguments and Monte Carlo simulation, we show that LMEMs generalize best when they include the maximal random effects structure justified by the design. The generalization performance of LMEMs including data-driven random effects structures strongly depends upon modeling criteria and sample size, yielding reasonable results on moderately-sized samples when conservative criteria are used, but with little or no power advantage over maximal models. Finally, random-intercepts-only LMEMs used on within-subjects and/or within-items data from populations where subjects and/or items vary in their sensitivity to experimental manipulations always generalize worse than separate F 1 and F 2 tests, and in many cases, even worse than F 1 alone. Maximal LMEMs should be the 'gold standard' for confirmatory hypothesis testing in psycholinguistics and beyond.
The use of both linear and generalized linear mixed‐effects models ( LMM s and GLMM s) has become popular not only in social and medical sciences, but also in biological sciences, especially in the field of ecology and evolution. Information criteria, such as Akaike Information Criterion ( AIC ), are usually presented as model comparison tools for mixed‐effects models. The presentation of ‘variance explained’ ( R ² ) as a relevant summarizing statistic of mixed‐effects models, however, is rare, even though R ² is routinely reported for linear models ( LM s) and also generalized linear models ( GLM s). R ² has the extremely useful property of providing an absolute value for the goodness‐of‐fit of a model, which cannot be given by the information criteria. As a summary statistic that describes the amount of variance explained, R ² can also be a quantity of biological interest. One reason for the under‐appreciation of R ² for mixed‐effects models lies in the fact that R ² can be defined in a number of ways. Furthermore, most definitions of R ² for mixed‐effects have theoretical problems (e.g. decreased or negative R ² values in larger models) and/or their use is hindered by practical difficulties (e.g. implementation). Here, we make a case for the importance of reporting R ² for mixed‐effects models. We first provide the common definitions of R ² for LM s and GLM s and discuss the key problems associated with calculating R ² for mixed‐effects models. We then recommend a general and simple method for calculating two types of R ² (marginal and conditional R ² ) for both LMM s and GLMM s, which are less susceptible to common problems. This method is illustrated by examples and can be widely employed by researchers in any fields of research, regardless of software packages used for fitting mixed‐effects models. The proposed method has the potential to facilitate the presentation of R ² for a wide range of circumstances.
Results of an acoustic study of voiceless fricatives in seven languages are presented. Three measurements were taken: duration, center of gravity, and overall spectral shape. In addition, formant transitions from adjacent vowels were measured for a subset of the fricatives in certain languages. Fricatives were well differentiated in terms of overall spectral shape and their co-articulation effects on formant transitions for adjacent vowels. The center of gravity measurement also proved useful in differentiating certain fricatives. Duration generally was less useful in differentiating the fricatives. In general, results were consistent across speakers and languages, with lateral fricatives displaying the greatest interlanguage variation in their acoustic properties and /s/ providing the greatest source of interspeaker variation.
The performance of Spanish-English bilinguals in two perception tasks, using a synthetic speech continuum varying in voice onset time, was compared with the performance of Spanish and English monolinguals. Voice onset time in speech production was also compared between these groups. Results in perception of bilinguals differed from that of both monolingual groups. Results of bilingual production in their two languages conformed with results obtained from each monolingual group. The perceptual results are interpreted in terms of differences in the use of available acoustic cues by bilingual and monolingual listeners of English and Spanish.