ArticlePDF Available

Formant dynamics in second language speech: Japanese speakers' production of English liquids

Authors:

Abstract

This article reports an acoustic study analysing the time-varying spectral properties of word-initial English liquids produced by 31 first-language (L1) Japanese and 14 L1 English speakers. While it is widely accepted that L1 Japanese speakers have difficulty in producing English /l/ and /ɹ/, the temporal characteristics of L2 English liquids are not well-understood, even in light of previous findings that English liquids show dynamic properties. In this study, the distance between the first and second formants (F2–F1) and the third formant (F3) are analysed dynamically over liquid-vowel intervals in three vowel contexts using generalised additive mixed models (GAMMs). The results demonstrate that L1 Japanese speakers produce word-initial English liquids with stronger vocalic coarticulation than L1 English speakers. L1 Japanese speakers may have difficulty in dissociating F2–F1 between the liquid and the vowel to a varying degree, depending on the vowel context, which could be related to perceptual factors. This article shows that dynamic information uncovers specific challenges that L1 Japanese speakers have in producing L2 English liquids accurately.
Formant dynamics in second language speech:
Japanese speakers’ production of English liquids
Takayuki Nagamine
Department of Linguistics and English Language, County South, Lancaster University, Lancaster,
LA1 4YL, United Kingdoma)
(Dated: 14 December 2023)
This article reports an acoustic study analysing the time-varying spectral properties of word-
initial English liquids produced by 31 first-language (L1) Japanese and 14 L1 English speak-
ers. While it is widely accepted that L1 Japanese speakers have difficulty in producing English
/l/ and /ô/, the temporal characteristics of L2 English liquids are not well-understood, even
in light of previous findings that English liquids show dynamic properties. In this study,
the distance between the first and second formants (F2F1) and the third formant (F3) are
analysed dynamically over liquid-vowel intervals in three vowel contexts using generalised ad-
ditive mixed models (GAMMs). The results demonstrate that L1 Japanese speakers produce
word-initial English liquids with stronger vocalic coarticulation than L1 English speakers.
L1 Japanese speakers may have difficulty in dissociating F2F1between the liquid and the
vowel to a varying degree depending on the vowel context, which could be related to percep-
tual factors. This article shows that dynamic information uncovers specific challenges that
L1 Japanese speakers have in producing L2 English liquids accurately.
[https://doi.org(DOI number)]
[XYZ] Pages: 115
I. INTRODUCTION
A. Acquisition of English /l/ and /ô/ by L1 Japanese speakers
The current study investigates time-varying spectral
properties of English liquids produced by first-language
(L1) Japanese speakers. Numerous studies have shown
that the acquisition of English liquids is particularly chal-
lenging for L1 Japanese speakers (e.g., Aoyama et al.,
2019;Best and Strange,1992;Flege et al.,1995;Saito
and Munro,2014;Sheldon and Strange,1982). They typ-
ically perceive English /l/ and /ô/ as instances of a single
L1 category of Japanese /r/ (e.g., Best and Strange,1992;
Guion et al.,2000). This corresponds to the learning of
‘similar’ phones between L1 and L2 in the Speech Learn-
ing Model (SLM: Flege,1995;Flege and Bohn,2021)
and the single-category (SC) or the category-goodness
(CG) assimilation scenarios in the Perceptual Assimila-
tion Model of Second Language (L2) Speech Learning
(PAM-L2: Best and Strange,1992;Best and Tyler,
2007;Hattori and Iverson,2009), predicting a moderate
to substantial difficulty in acquisition of the L2 sounds.
SLM posits that perceptual accuracy lays the foundation
for accurate L2 speech production because L2 learners
develop articulatory rules in the L2 phonetic categories
that are established over the course of L2 speech learning
(Flege and Bohn,2021).
The difficulty L1 Japanese speakers face in acquiring
English /l/ and /ô/ is associated with their sensitivity to
a);t.nagamine@lancaster.ac.uk
the phonetic cues used to distinguish the contrast. The
key spectral dimension that contrasts English /l/ and
/ô/ is the frequency of the third formant (F3); American
English /ô/ is associated with a notably low F3at 1,300
Hz for male speakers and 1,800 Hz for female speakers
whereas laterals show a high F3at approximately 2,500-
2,800 Hz (Espy-Wilson,1992;Stevens,2000). The F2
frequency is associated with the resonance of the vocal
tract cavity posterior to the primary constriction for both
laterals and rhotics, which are commonly produced with
a backed tongue body configuration (Stevens,2000). Lat-
erals are generally characterised by clear-dark allophony
according to syllabic position; ‘clear’ /l/s are often as-
sociated with laterals in pre-vocalic, syllable-initial po-
sition, and they typically have higher F2values and a
greater separation between F2and F1(F2F1) than the
post-vocalic ‘dark’ counterpart (Carter and Local,2007;
Recasens,2012). American English exhibits relatively
darker realisations of liquids than British English over-
all, but syllable-initial laterals in American English are
still somewhat ‘clearer’ than syllable-final counterparts
(Recasens,2012). This clear-dark allophony according
to the syllable position results from different articulatory
configurations, such that the degree of the tongue body
retraction is greater for the final laterals than for the
initial laterals (Recasens,2012).
L1 Japanese speakers tend to rely on the less reli-
able cue of F2in their perception of English /l ô/ than
a more robust cue of F3(Iverson et al.,2003;Saito and
Munro,2014). As a result, they tend to produce the dis-
tinction along the F2dimension instead of learning to
make a contrast along F3(Aoyama et al.,2019;Saito
J. Acoust. Soc. Am. / 14 December 2023 JASA 1
and van Poeteren,2018). For instance, they produce
word-initial English /l/ with a somewhat higher F2(ap-
proximately 1,500 - 1,800 Hz) than L1 English speakers
(approximately 1,200 - 1,500 Hz), whereas F2frequencies
for English /ô/ are similar between the two speaker pop-
ulations (Aoyama et al.,2019;Flege et al.,1995). As for
F3, they produce English /ô/ with a relatively high F3
(2,000-2,600 Hz) but produce /l/ with F3values compa-
rable to L1 English speakers (Aoyama et al.,2019;Flege
et al.,1995;Saito and Munro,2014). Nevertheless, previ-
ous research claims that L1 Japanese speakers could learn
to use the acoustic cues as L1 English speakers would do,
especially for F1and F2; several studies reported similar
F1values in production of English liquids between L1
Japanese and L1 English speakers (Aoyama et al.,2019;
Flege et al.,1995;Saito and Munro,2014). Saito and
Munro (2014) also argue that the use of F2is easier for
L1 Japanese speakers to acquire than that of F3for En-
glish /ô/ based on findings that L1 Japanese speakers
who resided in Canada for longer than 2.5 months pro-
duced native-like F2values for English /ô/ compared to
those who had less overseas experience.
The degree of difficulty in L1 Japanese speakers’ ac-
quisition of English liquids also varies depending on the
vowel context, in which they are better at correctly iden-
tifying word-initial English liquids adjacent to front vow-
els compared to back vowels in perception (Shimizu and
Dantsuji,1983). This might be because L1 Japanese
speakers may also perceive English /l/ and /ô/ as a
sequence of a back vowel and a tap (i.e., [WR]) possi-
bly due to the vocalic nature of English liquids (Guion
et al.,2000). L1 Japanese speakers are more likely to
hear a /w/-like percept when perceiving English /l/ and
/ô/ than L1 English speakers (Best and Strange,1992;
Mochizuki,1981;Yamada and Tohkura,1992). These
results overall suggest that L1 Japanese speakers are sen-
sitive not only to the phonemic status but also phonetic
details of English /l/ and /ô/. In particular, Shimizu and
Dantsuji (1983) speculate that coarticulatory properties
may play a role in explaining the vocalic contextual ef-
fects in L1 Japanese speakers’ correct identification of
English /l/ and /ô/.
B. Dynamic analysis of English liquids
Although the errors in segmental realisation in L2
speech are claimed to be rooted in perception, accu-
rate perception does not always entail accurate produc-
tion (Flege and Bohn,2021;Sheldon and Strange,1982).
While this does not mean that the role of perceptual ac-
curacy should be discounted, it implies that L2 speech
production may be shaped by a combination of factors
in addition to perceptual accuracy.
One such possible factor includes the dynamic nature
involved in the production of English liquids. Articula-
tion of English liquids requires coordination of multiple
articulatory gestures for accurate production (Campbell
et al.,2010;Sproat and Fujimura,1993). English later-
als, for instance, involve coordination of tongue tip and
dorsum gestures, and the timing and magnitude interact
with the syllabic position; a tongue tip gesture precedes
a tongue dorsum gesture with a greater magnitude for
clear /l/ whereas the two gestures could be timed syn-
chronously for the dark /l/ (Sproat and Fujimura,1993).
English rhotics show similar patterning of gestural timing
and magnitude, where labial gestures precede the tongue
tip and tongue body gestures (Campbell et al.,2010;
Proctor et al.,2019). The dynamic nature of articulation
in English liquids suggests that the acoustic characteris-
tics of English liquids are inherently non-static, and it
is, therefore, often challenging to select a single point in
time that adequately represents liquid quality (Kirkham
et al.,2019;Ying et al.,2012).
In addition, acoustic realisations of liquids interact
with the neighbouring segments as a result of coarticu-
lation. While coarticulation is often viewed as a conse-
quence of the physiological mechanisms in the transition
between segmental targets, some aspects of coarticula-
tion may be language-specific and thus need to be learnt
(Beristain,2022;Keating,1985). Word-initial /ô/ in En-
glish, for instance, shows lower F3values when followed
by back vowels compared to other vowel conditions (King
and Ferragne,2020). Similarly, vowel context influences
realisations of American English /l/, particularly among
word-initial /l/s, such that F2values are higher in the /i/
context than in the /a/ context (Recasens,2012). Coar-
ticulatory effects of liquids could span longer-term than
the domain of liquid segment itself and provide percep-
tual basis for listeners to distinguish English /l/ and /ô/
(West,1999a,b).
The findings regarding the dynamic nature of liquid
production and liquid-vowel coarticulation may account
for the specific difficulties that L1 Japanese speakers have
in producing English /l/ and /ô/. L1 Japanese speakers
tend to substitute English /l/ and /ô/ with an alveo-
lar tap or flap [R], a canonical realisation of Japanese
/r/ (Riney et al.,2000). Previous articulatory studies
show that alveolar taps/flaps show stronger coarticula-
tory effects with the neighbouring vowels than English
laterals and rhotics; while the tongue dorsum gesture is
actively involved in the production of English /l/ and
/ô/, taps and flaps [R] show either less involvement of the
tongue dorsum or a ‘stablisation’ tongue dorsum gesture,
resulting in stronger coarticulation with the vowel (Mori-
moto,2020;Proctor,2011;Recasens,1991;Yamane et al.,
2015). Furthermore, an X-ray study suggests that L1
Japanese speakers’ articulation of English liquids shows
greater variability according to the vocalic environment
(Zimmermann et al.,1984). In sum, Japanese and En-
glish liquids differ in the way they are coarticulated with
the vowels, and it can be predicted that L1 Japanese
speakers exhibit different liquid-vowel coarticulatory pat-
terns from that of L1 English speakers.
Despite the findings regarding the complexity in-
volved in the production of English liquids, our under-
standing remains relatively limited regarding the specific
mechanism whereby L1 Japanese speakers struggle to
produce English /l/ and /ô/. This may be because previ-
2 J. Acoust. Soc. Am. / 14 December 2023 JASA
ous research commonly evaluates liquid quality based on
a single-point measurement, in which formant frequencies
are measured at one point in time, such as the F3min-
ima, the spectral onset or the spectral release (Aoyama
et al.,2019;Flege et al.,1995;Saito and Munro,2014).
Analysis of liquids based on a single measurement, how-
ever, inevitably averages out temporal information that
may be important for understanding the dynamic char-
acteristics of English liquids.
In the current study, I show that dynamic formant
measurement of English liquids allows us to better un-
derstand specific challenges that L1 Japanese speakers
have in producing English /l/ and /ô/. Previous research
suggests that (1) L1 Japanese speakers’ acquisition of
English liquids may be influenced by the phonetic de-
tails such as vowel environments, and (2) English liquids
show dynamic characteristics and interactions with the
neighbouring vowels. Given these, I hypothesise that L1
Japanese speakers’ production of English liquids will ex-
hibit different dynamic acoustical properties compared
to L1 English speakers. This study therefore asks what
dynamic acoustic properties L1 Japanese speakers would
show in their production of English /l ô/ compared to L1
English speakers.
I combine static and dynamic analyses of the acous-
tic properties of English liquids in this study. The static
analysis investigates the distance between second and
first formants (F2F1) and the third formant (F3) ex-
tracted at the liquid midpoint. The inclusion of this
measure allows me to discuss the results in light of pre-
vious research in which the single-measurement analysis
has been widely used (e.g., Aoyama et al.,2019;Flege
et al.,1995;Saito and Munro,2014;Saito and van Poet-
eren,2018). In addition, the time-varying changes in the
F2F1and F3values will capture the complex nature of
liquid acoustics and the coarticulatory interactions be-
tween the liquid and the vowel (Howson and Redford,
2021;Kirkham et al.,2019;Sproat and Fujimura,1993).
II. METHODS
A. Participants
The data for the current study are obtained from 45
speakers: 31 L1 Japanese learners of English (17 female
and 14 male) aged between 18 and 22 years (M= 19.81
years, SD = 1.05) and 14 L1 North American L1 English
speakers (11 female and 3 male) aged between 21 and 43
years (M= 28.93 years, SD = 6.08).
All of the L1 Japanese speakers were undergradu-
ate university students recruited from two universities in
Japan, located near the cities of Nagoya and Kobe re-
spectively. Their profile is considered to be typical for
average Japanese university students who study English
as a foreign language; all of them studied English pri-
marily through the school curriculum in either or both
primary and secondary schools, and continued it at the
tertiary level, with a mean length of English study being
9.31 years (SD = 2.42). They did not have an extended
stay in an English-speaking country, with the length of
overseas experience ranging from none to 4.25 months (M
= 0.77 months, SD = 1.35).
In evaluating L1 Japanese speakers’ L2 English profi-
ciency, participants were asked to report their perception
on their own oral fluency on a scale of seven, with 1 being
“I do not speak English at all.” to 7 being “No problems
in using English in daily life”. This is because there was
no common measure available across participants to es-
timate their English proficiency due to the fact that stu-
dents have taken different kinds of tests or that first-year
students had not yet taken any English language test.
Nevertheless, judging from the test scores that some of
the participants were able to provide and observations by
the researcher who has experience in English language
teaching in Japan, their English proficiency is considered
to be lower to upper intermediate, which largely agrees
with their subjective evaluation of their fluency in En-
glish (M= 3.84, SD = 1.10). Further details about the
participants can be found in the online supplementary
materials.
The 14 L1 English speakers identify themselves as
fluent L1 speakers of North American English who grew
up using English until 13 years of age. Five of them are
from Canada and nine are from the US. They resided in
the UK at the time of recording; six of them were post-
graduate students enrolled at a UK university and the
rest worked in companies in the UK. Recruitment of L1
North American English speakers reflects the situation
that American English tends to be chosen as a pedagog-
ical model in English language teaching in Japan and
therefore it is appropriate for L1 Japanese speakers’ pro-
duction to be compared to that of L1 North American
English speakers (Setter and Jenkins,2005).
B. Data collection
The audio recordings analysed in this study are a
subset of data collection for a larger study, in which
both articulatory and acoustic data were obtained in a
simultaneous high-speed ultrasound-audio recording set-
ting. For this reason, the participants wore an ultrasound
headset while recording stimuli for the current study.
The participants were recorded in a sound-attenuated
booth at universities in the UK for L1 North Ameri-
can English speakers and in a quiet room at universities
in Japan for L1 Japanese speakers. In recording some
of the L1 Japanese speakers, however, there was minor
background fan noise because of the Covid-19 restric-
tions mandating air ventilation at the time of record-
ing. Acoustic signals were pre-amplified, digitised and
recorded onto a laptop computer via a Sound Devices
USB-Pre2 audio interface at 44.1 kHz with 16-bit quan-
tisation.
The participants were asked to sit in front of the lap-
top screen and read the stimuli words in isolation that
were displayed one by one orthographically using Articu-
late Assistant Advanced (AAA) software version 220.4.1
(Articulate Instruments,2022). No carrier phrases were
J. Acoust. Soc. Am. / 14 December 2023 JASA 3
used here because (1) the use of carrier phrases would
impose additional difficulty on L1 Japanese speakers, es-
pecially those who were less proficient in English, and (2)
the experiment had to be as short as possible due to time
constraints in the data collection sessions.
In light of the language mode hypothesis (Grosjean,
2008) that the language setting in an experiment can
influence the participants’ speech perception and possibly
production, the recording sessions for the L1 Japanese
speakers were structured as follows. The first half of
the experiment, including briefing, equipment set-up and
recording of the Japanese words (not presented in this
paper), was conducted while I was giving instructions in
Japanese. Then, I switched the language of instructions
to English and the participant engaged in a short English
conversation activity. This included a semi-structured
dialogue in which I asked five simple questions to the
participants (e.g., ‘What do you study?’, ‘What do you
like the best about the university?’, etc.) Finally, the
Japanese participants recorded the English words while
I gave all the instructions in English. While it would
have been theoretically desirable to have someone else
who was an L1 English speaker lead the data collection
session for English words, it was challenging for reasons
of time and room availability given that each session for
L1 Japanese speakers took up to 90 minutes.
The recording session with the L1 North American
English speakers did not require such considerations be-
cause they recorded English words only. All the pro-
cedures were, therefore, conducted in English and each
session took up to approximately 60 minutes. The partic-
ipants were compensated for their time and participation
with the amount of 2,000 Japanese Yen or 15 British
Pound Stirlings in the form of cash or vouchers com-
mensurate with the regulations at each of the recording
venues. The research project has been reviewed and ap-
proved by the ethics committees at Lancaster University,
Kobe Gakuin University and Meijo University. Informed
consent to take part in the study was obtained in written
form from all participants.
C. Materials
Word-initial English /l/ and /ô/ were elicited from
16 monosyllabic CV(C)C words (eight minimal pairs),
followed by a close front /i/, an open front /æ/, or a
close back vowel /u/ (see Table I). The coda consonants
were restricted to bilabials /p b m/ or labiodentals /f v/
to minimise the anticipatory coarticulatory effects on the
word-initial liquids. All the target words were checked us-
ing the Longman Pronunciation Dictionary (Wells,2008)
to ensure that they have the intended vowel environment
in American English.
D. Segmentation and data processing
Prior to segmentation, audio recordings were low-
pass filtered at 11,000 Hz and downsampled to 22,050
Hz. Automatic segmentation was carried out at phoneme
TABLE I. Word list per vowel context
Vowel context Words
/i/ leap / reap leaf / reef leave / reeve
/æ/ lap / rap lamb / ram lamp / ramp
/u/ lube / rube loom / room
level with Montreal Forced Aligner (MFA) version 2.0.6
(McAuliffe et al.,2017). I then inspected the aligned data
visually and manually corrected the segmentation using
Praat where necessary (Boersma and Weenink,2022).
I classified the liquid tokens into two broad cate-
gories: approximants and non-approximants, based on
the spectrographic representations aided by auditory im-
pressions. This decision reflects the consideration that
the L1 Japanese speakers’ production of liquids might
show a wide range of variations due to the allophonic
variation of Japanese /r/ and their articulatory strate-
gies for English /l/ and /ô/. Realisations for Japanese
/r/ include other types of approximants than English liq-
uids, such as the canonical [R], retroflex flap [õ], retroflex
lateral approximant [í] and a lateral flap [Õ] (Akamatsu,
1997;Arai,2013). They may also use a single strategy or
produce a reversed realisation for English /l ô/. It could
be the case, for instance, that they produce a lateral liq-
uid for both English /l ô/. It is also possible that they
use [l] for English /ô/ and [ô] for English /l/. Classifica-
tions based on these two broad categories: approximants
and non-approximants, therefore, guide me to choose an
appropriate type of analysis while maximising the chance
of capturing diverse acoustic properties in the L1 and L2
English liquids.
Based on these considerations, I first broadly la-
belled tokens as approximants if the liquid token in ques-
tion shows a vowel-like formant structure (Ladefoged and
Johnson,2010). The spectral analysis focuses only on the
tokens that are classified here as approximants; it thus
excludes 281 non-approximants tokens (e.g., taps or flaps
[R]) out of a total of 2,914 tokens, leaving 2,633 tokens
for further processing. The spectrographic examples of
an approximant and a non-approximant token are shown
in Fig. 1and 2.
Following this, I segmented the liquid approximant
tokens based on the primary cues of a steady state or
an approximately steady state of the F2and an abrupt
change in amplitude in the waveform (Lawson et al.,
2011). Laterals and rhotics in English involve various
stages, including the transition into the liquid, the steady
state, and the transition into the following vowel (Carter
and Local,2007). The current study uses the steady-
state portion to define the liquid as in previous studies
(Flege et al.,1995;Kirkham,2017). Although the liquid
steady-state is an approximation given the various stages
involved in the liquid acoustics mentioned above, this is-
sue can be minimised in the dynamic analysis because it
4 J. Acoust. Soc. Am. / 14 December 2023 JASA
FIG. 1. Example spectrogram of an L1 North American L1
English speaker’s production of leaf. Labels show phonetic
segments in ARPABET, in which ’IY1’ indicates a stressed
high front unrounded vowel /i/.
FIG. 2. Example spectrogram of a ‘definitely a tap’ token
of leaf produced by an L1 Japanese speaker. Labels show
phonetic segments in ARPABET, in which ’IY1’ indicates a
stressed high front unrounded vowel /i/.
shows holistic time-varying trajectories across the liquid
and vowel.
E. Acoustic analysis
This study analyses 2,306 liquid tokens for mid-point
analysis and 2,515 liquid-vowel tokens for dynamic anal-
ysis. The detailed breakdown is shown in Table II. The
current study compares two acoustic parameters between
L1 Japanese and L1 English speakers’ production of En-
glish liquids: (1) the distance between second (F2) and
first (F1) formants (F2F1) and (2) the third formant
(F3). F2F1is used as a measure to evaluate acous-
tic liquid quality; lower F2F1values can be related to
darker realisations of liquids, resulting from a greater de-
gree of tongue retraction (Howson and Redford,2021;
Sproat and Fujimura,1993). F3is a primary acoustic
dimension that distinguishes English /l/ and /ô/, and
previous research reports robust differences between L1
Japanese and L1 English speakers’ production of English
liquids.
F1,F2and F3values were estimated and extracted
with Fast Track, an automatic formant estimation Praat
plug-in (Barreda,2021). Fast Track samples formant fre-
quencies every 2 ms throughout the interval, resulting
in smooth trajectories between F1and F3. It then out-
puts the estimated formant frequencies while aggregating
them in a specified number of bins. The current analysis
uses 11 data points throughout the liquid-vowel interval
for each formant trajectory. The advantage of using Fast
Track is that it performs multiple-step formant estima-
tions by adjusting the maximum formant frequency and
obtains the ‘best-winning’ analysis based on regression
analyses predicting the formant frequency as a function
of time (Barreda,2021). This achieves increased formant
estimation accuracy by specifying different formant fre-
quency ranges according to speakers’ age and gender.
In the current study, the female and male speakers
were analysed separately with different ranges of the up-
per formant frequency ceiling: between 5,000-7,000 Hz
for female speakers and between 4,500-6,500 Hz for male
speakers. Fast Track then performs 24-step formant es-
timations with varying upper-frequency ceilings and es-
timates the formant frequencies at 11 equidistant points
during (1) the liquid and (2) the liquid-vowel intervals
with a 25 ms window padded before and after the seg-
ment. After formant tracking, formant estimation errors
can be corrected based on visual inspection of the 24-
step analyses. Using this, I visually inspected all the
tokens one by one and either improved the formant mea-
surement by nominating a different winning analysis or
omitted the tokens when none of the analyses looked rea-
sonable. At this visual inspection stage, 118 tokens out
of the 2,633 tokens (see Section II D) were excluded due
to poor formant estimation accuracy.
Finally, Fast Track automatically omits tokens when
they are shorter than 30 ms as formant estimation can
be challenging for extremely short tokens. As a result,
209 tokens were excluded from the data set for the static
analysis, leaving 2,306 tokens for static analysis and 2,515
tokens for dynamic analysis. The difference in the num-
ber of tokens reflects the greater number of liquid-only
tokens being omitted automatically by Fast Track as they
were inevitably shorter than liquid-vowel intervals. Note
that the online supplementary materials outline the data
processing procedure described here.
F. Statistical analysis
All statistical analyses were performed using R ver-
sion 4.2.2 (R Core Team,2022) and data visualisation
was performed using the tidyverse suite (Wickham et al.,
2019). Prior to the statistical analysis, the formant values
were transformed into Bark scale using the bark function
J. Acoust. Soc. Am. / 14 December 2023 JASA 5
TABLE II. The number of tokens per vowel context
Vowel context /i/ /æ/ /u/
L1 English
liquid a155 / 187 188 / 173 119 / 130
liquid-vowel b177 / 197 199 / 192 130 / 134
L1 Japanese
liquid a205 / 246 298 / 286 149 / 170
liquid-vowel b231 / 284 312 / 310 169 / 180
a/l/ tokens on the left, /ô/ tokens on the right.
b/l/+vowel tokens on the left, /ô/+vowel tokens on the right.
in the emuR package to allow for cross-speaker compar-
isons (Harrington,2021).
For the static analysis, Bark-converted F2F1(Bark
F2F1) and F3(Bark F3) at liquid midpoint were mod-
elled using linear mixed-effect models (LME) using the
lme4::lmer function (Bates et al.,2015). Separate mod-
els were constructed for /l/ and /ô/ respectively. The
fixed effects included (1) the speaker’s first language (L1 :
i.e., English vs Japanese), (2) vowel context (vowel) and
(3) the speaker’s gender (gender). No interactions were
included because initial explorations suggested that the
current data set does not have the statistical power to
detect interactions.
Furthermore, an anonymous reviewer suggested clas-
sifying the participants into groups according to their
English proficiency and including this variable for anal-
ysis. Following this, I classified the participants into
four groups based on the distribution of their subjective
fluency rating scores. L1 Japanese speakers are classi-
fied into the advanced (rating 5-6, n= 7), intermediate
(rating 4, n= 14), and beginner (rating 1-3, n= 10)
groups. L1 English speakers constitute a group on their
own (L1 English; rating 7, n= 14). The L1 English
speaker group, however, makes the proficiency variable
confounded with the L1 variable, making the inclusion of
the proficiency variable problematic. The issue is mani-
fested in the rank-deficient warning for LMEs when both
L1 and proficiency are included in the same model, sug-
gesting that two or more variables are not linearly inde-
pendent from each other. A further analysis using the
caret::findLinearCombos function shows co-linearity be-
tween L1 and proficiency and suggests excluding the level
of L1 English speakers from the proficiency variable.
For this reason, I perform a separate analysis fo-
cussing only on the L1 Japanese speakers’ data to in-
vestigate the effects of proficiency and summarise the
results at the end of the static analysis. I have included
L1 Japanese speakers only here because inclusion of L1
English speakers might reduce the magnitude of between-
group differences among L1 Japanese speakers. The vi-
sualisation includes L1 English speakers’ data only for
the purpose of comparison. Although I will not explore
this extensively as this is not the main focus of the study,
interested readers may be referred to the online supple-
mentary materials for further details of the analysis and
results.
The random effect structure for the linear models in-
cluded by-participant varying slopes and by-participant
varying intercepts for vowel contexts and by-word vary-
ing intercepts. As a result, the following specification is
used for four final models (i.e., models predicting Bark
F2F1and Bark F3for /l/ and /ô/):
lmer(Bark F2F1or Bark F3L1 +vowel +gender
+ (1 |word) + (1 + vowel |speaker))
The significance of the fixed effects was tested via
likelihood ratio testing by comparing the full model and
the nested model excluding the fixed effect in question
(Winter,2020). If the full model significantly improved
the model fit, I concluded that the main effect signif-
icantly influenced the outcome variable. The patterns
associated with the vowel contexts are interpreted via
data visualisation for the sake of model simplicity, but
additional statistical comparisons are available in online
supplementary materials.
Second, the dynamic formant analysis used gen-
eralised additive mixed models (GAMMs) using the
mgcv::bam function (Wood,2017). The non-linear differ-
ences between contours can be evaluated in light of height
and shape of the trajectories; the height dimension can
be modelled via parametric terms, and the shape dimen-
sion via so-called smooth terms that specify the degree
of wiggliness of contours (oskuthy et al.,2018). Differ-
ences between a set of contours can also be directly mod-
elled by incorporating a reference smooth (i.e., a contour
at the reference level) and the difference smooth (i.e.,
a contour that models the degree of by-group difference
of contours) (oskuthy,2017). For more details about
GAMMs, please be referred to the existing tutorial pa-
pers (e.g., oskuthy,2017;oskuthy et al.,2018;Wieling,
2018).
In the current study, I focus on differences in trajec-
tory height and shape between the speaker groups (i.e.,
English vs Japanese). Separate models were constructed
for each combination of the liquid-vowel pairings. Each
model predicts the formant values, either Bark F2F1
or Bark F3, by a parametric term of the speaker’s first
language and gender, as well as a time-varying reference
smooth, a time-varying by-L1 difference smooth and a
time-varying by-gender smooth. It also includes time-
by-speaker and time-by-word random smooths.
Note, again, that English proficiency was not in-
cluded in the GAMMs models together with L1 as this
resulted in inaccurate predictions of the formant trajec-
tories compared to the visualisations of the raw data.
Instead, similarly to the linear mixed-effect model analy-
sis, I conducted a separate analysis for the effects of pro-
ficiency using the L1 Japanese speakers’ data only and
summarise the relevant results at the end of the dynamic
analysis. The choice of including L1 Japanese speakers
only reflects the consideration that L1 English speakers’
trajectories may be different in both shape and height,
6 J. Acoust. Soc. Am. / 14 December 2023 JASA
which would make it difficult for me to interpret whether
statistically significant differences result from speakers’
L1 or L1 Japanese speakers’ proficiency. This is clear
in the visualisations in Fig. 9and 10, in which L1 En-
glish speakers’ trajectories are distinct from the three
groups of L1 Japanese speakers. Further details can also
be found in the online supplementary materials.
Residual autocorrelations in the trajectories were
corrected using the autoregressive error model (AR
model). The autoregressive parameter (rho: ρ) was set as
the amount of autocorrelation at lag 1 in the model, esti-
mated using the start value rho function in the itsadug
package (van Rij et al.,2020). While this is usually an ad-
equate estimate, the residual autocorrelations were nega-
tive in some cases, indicating that a lower value would be
optimal (oskuthy et al.,2018;Wieling,2018). In such
cases, the new rho value was determined by exploring a
range of values and visualising the autocorrelations at lag
1 for each rho value. The final model specification across
12 models (i.e., two outcome variables (i.e., Bark F2F1
and Bark F3) for two liquids (i.e., /l/ and /ô/) in three
vowel contexts (i.e., /æ/, /i/ and /u/)) is:
bam(Bark F2F1or Bark F3L1 +gender +
s(time, bs = “cr”) + s(time, by = L1, bs = “cr”) +
s(time, by = gender, bs = “cr”) + s(time,speaker, bs =
“fs”, xt = “cr”, m = 1) + s(time,word, bs = “fs”, xt =
“cr”, m = 1), method = “ML”)
Trajectory height and shape were compared through
model comparisons using the itsadug::compareML func-
tion following the previous research (Kirkham et al.,
2019;oskuthy,2017;oskuthy et al.,2018) as follows:
1. I first compared (1) the full model and (2) the
nested model excluding the parametric and the
smooth terms associated with the speaker’s L1 or
gender. This allows a comparison of the overall dif-
ferences associated with these effects in both height
and shape between the two contours.
2. If the above comparison showed a significantly im-
proved model fit of the full model, I then compared
(1) the full model and (2) the nested model in-
cluding the parametric term of L1 or gender but
still excluding the by-L1 or by-gender smooth term.
This tests whether the two contours differ signifi-
cantly in shape.
If the full model was still better in the model fit af-
ter 2 above, I concluded that both trajectory height and
shape were different at a statistically significant level. If
the full model improved the model fit for 1 but not for
2, then there was only a difference in trajectory height.
Otherwise, I concluded that there was little evidence that
the two trajectories are significantly different.
III. RESULTS
A. Liquid static analysis
In this section, I first present the liquid midpoint
analysis of F2F1and F3using LMEs in order to inves-
tigate the overall trends in liquid quality. The static anal-
ysis tests the main effects of L1,vowel and gender while
the liquid-vowel interactions are interpreted via data vi-
sualisation. Note that the baseline participant popula-
tion (i.e., intercept) is the female L1 English speakers in
the /æ/ context but the gender is referred to only when
the gender effect is discussed. An additional analysis of
vowel midpoints is also available in online supplementary
materials.
1. F2-F1midpoint
The model summaries for the F2F1models are
shown in Table III. The lateral F2F1model predicts
that L1 Japanese speakers produce laterals higher at 8.83
Bark than L1 English speakers (6.74 Bark). F2F1for
laterals slightly varies according to the vowel context;
F2F1is the highest in the /i/ context with an averaged
F2F1being at 8.02 Bark, followed by /u/ (7.54 Bark)
and /æ/ (6.74 Bark). Male speakers produce laterals
with lower F2F1values at 6.06 Bark.
The rhotic F2F1model predicts that L1 English
speakers produce rhotics in the /æ/ context at 6.38 Bark
and L1 Japanese speakers overall produce 8.24 Bark. It
also predicts higher F2F1overall in the /i/ context (7.53
Bark) and in the /u/ context (6.98 Bark) than in the /æ/
context. Similarly to the laterals, male speakers produce
rhotics with lower F2F1values at 5.63 Bark.
Overall, L1 Japanese speakers produce both English
/l/ and /ô/ with consistently higher F2F1than L1 En-
glish speakers across vowel contexts (Fig. 3), and this is
supported by the significant main effect of L1 for both
/l/ (χ2(1) = 17.58, p< .001) and /ô/ (χ2(1) = 15.68,
p< .001). The main effect of vowel is also shown to be
significant for both /l/ (χ2(2) = 22.74, p< .001) and /ô/
(χ2(1) = 22.35, p< .001). While male speakers produce
liquids with lower F2F1values than female speakers,
this difference was not shown to be statistically signifi-
cant for either laterals (χ2(1) = 3.23, p= .073) or rhotics
(χ2(1) = 3.28, p= .070).
2. F3midpoint
The model summaries for the F3models are shown
in Table IV. The lateral F3model predicts that L1 En-
glish speakers produce F3at 15.83 Bark for /l/ while
L1 Japanese speakers have a slightly lower F3at 15.54
Bark. Although model comparisons suggest significant
effects of vowel for /l/ (χ2(2) = 13.05, p= .001), the dif-
ference seems to be quite minor; the model predicts 15.65
Bark for /l/ in the /i/ context and 15.44 Bark in the /u/
context. Finally, female speakers produce laterals with
higher F3values by 1.12 Bark than male speakers overall.
The rhotic F3model predicts that L1 English speak-
ers produce 12.17 Bark for /ô/ where L1 Japanese speak-
J. Acoust. Soc. Am. / 14 December 2023 JASA 7
TABLE III. LME summary: Liquid F2F1(Bark)
Variable βSE t p(χ2)
Lateral /l/
Intercept 6.74 0.33 20.36
L1 < .001
Japanese 1.99 0.38 5.25
Vowel < .001
/i/ 1.28 0.16 8.23
/u/ 0.80 0.18 4.50
Gender 0.072
Male 0.68 0.36 1.86
Rhotic /ô/
Intercept 6.38 0.34 18.53
L1 < .001
Japanese 1.86 0.40 4.68
Vowel < .001
/i/ 1.15 0.16 7.09
/u/ 0.60 0.13 4.58
Gender 0.070
Male 0.75 0.38 1.97
FIG. 3. F2F1(Bark) at liquid midpoint. Each column shows
vowel contexts for /l/ (top row) and /ô/ (bottom row). Each
panel shows distributions of F2F1(Bark) for L1 English
(left) and L1 Japanese (right) speakers. Overlaid is the scatter
plot indicating speaker’s gender: female (grey circles) and
male (yellow triangles) speakers. (Colour online)
ers produce higher F3at 14.05 Bark. Similar to the lat-
erals, slight differences are found for /ô/ in the /i/ and
/u/ contexts compared to /æ/; the model predicts 12.54
Bark in the /i/ context and 12.21 Bark in the /u/ con-
text. The main effect of vowel is also significant here
(χ2(2) = 13.78, p= .001).
While the main effect of vowel influences the F3val-
ues only slightly for both /l/ and /ô/, the effects of L1
are suggested to be significant for /ô/ (χ2(1) = 30.62, p
< .001) but not for /l/ (χ2(1) = 1.97, p= .161). Fig. 4
seems to suggest a bimodal distribution in F3(Bark) for
FIG. 4. F3(Bark) at liquid midpoint. Each column shows
vowel contexts for /l/ (top row) and /ô/ (bottom row). Each
panel shows distributions of F3(Bark) for L1 English (left)
and L1 Japanese (right) speakers. Overlaid is the scatter plot
indicating speaker’s gender: female (grey circles) and male
(yellow triangles) speakers.
L1 English speakers, especially for /l/ in the /i/ and /u/
contexts. This seems to result from gender-related dif-
ferences, in which male speakers produced liquids with
lower F3values than female speakers. Indeed, the effects
of gender are shown to be statistically significant for both
laterals (χ2(1) = 22.70, p< .001) and rhotics (χ2(1) =
15.87, p< .001).
TABLE IV. LME summary: Liquid F3(Bark)
Variable βSE t p(χ2)
Lateral /l/
Intercept 15.83 0.18 89.35
L1 .016
Japanese 0.29 0.20 1.44
Vowel .001
/i/ 0.18 0.08 2.12
/u/ 0.39 0.08 4.82
Gender < .001
Male 1.12 0.19 5.79
Rhotic /ô/
Intercept 12.17 0.25 48.56
L1 < .001
Japanese 1.88 0.26 7.18
Vowel .001
/i/ 0.37 0.08 4.47
/u/ 0.04 0.10 0.41
Gender < .001
Male 1.15 0.25 4.53
8 J. Acoust. Soc. Am. / 14 December 2023 JASA
FIG. 5. F2F1(Bark) at liquid midpoint by proficiency
groups. Each column shows vowel contexts for /l/ (top row)
and /ô/ (bottom row). Each panel shows distributions of
F2F1(Bark) for L1 English speakers and three groups of
L1 Japanese speakers: advanced, intermediate and beginner,
from left to right. (Colour online)
3. Effects of L2 proficiency on the midpoint formant
measurement
In addition to the main analysis, the effects of pro-
ficiency are tested for the three groups of L1 Japanese
speakers. Grouping is based on their subjective fluency
judgement scores: beginner (n= 10, rating 1-3), inter-
mediate (n= 14, rating 4), and advanced (n= 7, rat-
ing 5-6). Similarly to the main analysis, separate linear
mixed-effect models were specified in which Bark F2F1
or Bark F3are predicted by fixed effects of proficiency,
vowel and gender with by-item random intercepts and
by-speaker random slopes and intercepts for vowels. The
results are visualised in Figs. 5and 6.
The F2F1models suggested statistically significant
effects of proficiency on Bark F2F1for /ô/ (χ2(2) =
7.52, p= .002), in which the advanced L1 Japanese learn-
ers of English produce rhotics with lower F2F1than
those in the beginner and intermediate groups. No sta-
tistically significant proficiency effects are found for /l/
(χ2(2) = .12, p= .94). For Bark F3, no statistically
significant effects of proficiency are found for either /l/
(χ2(2) = .81, p= .67) or /ô/ (χ2(2) = .057, p= .97).
4. Summary: Static analysis
L1 Japanese speakers produce higher F2F1for both
/l/ and /ô/ across vowel contexts. F3values for /l/ are
only slightly lower for L1 Japanese speakers while they
produce /ô/ with higher F3than L1 English speakers
across vowel contexts. Male speakers produce liquids
with lower F2F1and F3values, and this was partic-
ularly the case for F3. Finally, L1 Japanese speakers
in the advanced group produced lower F2F1than the
other groups for /ô/.
FIG. 6. F3(Bark) at liquid midpoint by proficiency groups.
Each column shows vowel contexts for /l/ (top row) and
/ô/ (bottom row). Each panel shows distributions of F2F1
(Bark) for L1 English speakers and three groups of L1
Japanese speakers: advanced, intermediate and beginner,
from left to right. (Colour online)
B. Dynamic Analysis
Dynamic analysis in this section now focuses on vari-
ation in F2F1and F3trajectories across the liquid-
vowel interval using GAMMs. In the visualisation of the
liquid-vowel trajectories (Fig. 7and 8), the liquid por-
tion corresponds roughly to the first third of the interval
whereas the vowel to the second two-third. Note the visu-
alisation shows the predictions based on the full models.
1. F2F1liquid-vowel trajectory
The results of the model comparisons for the F2F1
dynamic analysis are shown in Table Vfor laterals and
Table VI for rhotics. The visualisations are shown in
Fig. 7. The model comparisons show that the height and
shape of the F2F1trajectories are significantly different
between L1 English and L1 Japanese speakers for both
liquids in all vowel contexts. The visualisations of the
GAMMs show that the trajectories for L1 English and
L1 Japanese speakers are similar in the /i/ context (the
middle panels in Fig. 7) but look quite different in the
/æ/ (left) and /u/ (right) contexts. L1 English speakers
follow a similar tendency across the vowel contexts such
that they start from lower F2F1values at the onset of
the liquid, showing an increase towards the vowel target
and a slight decrease towards the offset of the vowel.
L1 Japanese speakers, on the other hand, show dis-
tinct trajectory patterns depending on vowel context. In
the /i/ context, their trajectories follow a similar ten-
dency to that of L1 English speakers, but with an earlier
rise from the liquid onset towards the vowel resulting in
a consistently higher trajectory than L1 English speakers
in the first half of the interval. In the /æ/ context, on
the other hand, the L1 Japanese speakers show an op-
posite pattern to L1 English speakers, in which F2F1
values are the highest earlier during the first third of the
J. Acoust. Soc. Am. / 14 December 2023 JASA 9
interval and decrease to the vowel with a small rise to-
wards the end of the interval. Finally, the L1 Japanese
speakers’ trajectories in the /u/ context show smaller
fluctuations than that of L1 English speakers; the tra-
jectory shows almost a linear and monotonic decrease in
this vowel context.
Differences associated with gender are statistically
significant for trajectory height but not for shape for both
laterals and rhotics across the vowel contexts. This sug-
gests almost linear differences between female and male
speakers’ trajectories, in which female speakers show con-
stantly higher trajectories than male speakers, and this
is evident in Fig. 7.
TABLE V. Model comparisons for F2F1GAMMs for lat-
erals
Comparison χ2df p(χ2)
/l/: /æ/ context
Overall: L1 69.88 3 < .001
Shape: L1 66.63 2 < .001
Overall: Gender 6.67 3 0.004
Shape: Gender 1.10 2 0.333
/l/: /i/ context
Overall: L1 16.54 3 < .001
Shape: L1 9.68 2 < .001
Overall: Gender 18.91 3 < .001
Shape: Gender 0.34 2 0.712
/l/: /u/ context
Overall: L1 25.41 3 < .001
Shape: L1 23.67 2 < .001
Overall: Gender 4.02 3 0.045
Shape: Gender 0.07 2 0.929
2. F3liquid-vowel trajectory
The model comparisons for F3are shown in Table VII
for laterals and in Table VIII for rhotics. The visualisa-
tions are shown in Fig. 8. The lateral-vowel trajectories
(the top half of Fig. 8) show similarities between L1 En-
glish and L1 Japanese speakers. The model comparisons
suggest that, while the trajectory shape and height are
different between L1 English and L1 Japanese speakers in
the /i/ context, the trajectories in the /æ/ and /u/ con-
texts are not statistically significantly different, with the
L1 Japanese speakers’ trajectories being slightly lower,
especially in the first half of the interval.
Even in the lateral-/i/ context where trajectory
height and shape are statistically significant, however, a
closer look at the GAMMs model specifications and the
model comparisons suggest that the difference between
the two trajectories is marginal. Neither parametric or
smooth terms associated with the L1 difference were sta-
tistically significant in the model summary (β= 0.18,
SE = 0.10, t= 1.83, p= 0.07 for the parametric term;
F(6.05) = 1.79, p= 0.09 for the difference smooth). The
TABLE VI. Model comparisons for F2F1GAMMs for
rhotics
Comparison χ2df p(χ2)
/ô/: /æ/ context
Overall: L1 53.57 3 < .001
Shape: L1 45.94 2 < .001
Overall: Gender 4.10 3 0.042
Shape: Gender 0.06 2 0.938
/ô/: /i/ context
Overall: L1 39.40 3 < .001
Shape: L1 24.09 2 < .001
Overall: Gender 21.90 3 < .001
Shape: Gender 0.33 2 0.723
/ô/: /u/ context
Overall: L1 21.62 3 < .001
Shape: L1 17.83 2 < .001
Overall: Gender 4.00 3 0.046
Shape: Gender 0.02 2 0.985
FIG. 7. The F2F1(Bark) trajectories predicted by GAMMs
over the liquid-vowel intervals for each liquid (rows) in each
vowel context (columns). Each panel shows predictions based
on the full model with a mean smooth and 95% confidence
interval for L1 English (blue) and L1 Japanese (red) speakers
and for female (solid) and male (dashed) speakers. (Colour
online)
model comparison also suggests only a marginal improve-
ment in the Akaike Information Criterion (AIC) values
(1561.42 for the full model and 1565.84 for the nested
model). Fig. 8also shows that the 95% confidence inter-
vals of two trajectories overlap substantially throughout
the liquid-vowel interval.
The /ô/-vowel trajectories for F3, on the other hand,
show statistically significant differences in both trajec-
tory height and shape in all vowel contexts, although
both L1 English and L1 Japanese speakers share a simi-
lar trend in the visualisation in Fig. 8. Both groups show
lower F3values at the liquid onset, which then increase
10 J. Acoust. Soc. Am. / 14 December 2023 JASA
towards the vowel, where L1 English and L1 Japanese
speakers’ trajectories seem to converge. L1 Japanese
speakers’ trajectories are overall flatter and higher than
that of L1 English speakers across all the vowel contexts.
Finally, similarly to the F2F1results, the gender
effect seems to be statistically significant only for the
trajectory height. This again suggests that the difference
between trajectories for female and male speakers is close
to linear (see Fig. 8).
TABLE VII. Model comparisons for F3GAMMs for laterals
Comparison χ2df p(χ2)
/l/: /æ/ context
Overall: L1 3.12 3 0.100
Shape: L1
Overall: Gender 17.57 3 < .001
Shape: Gender 1.22 2 0.295
/l/: /i/ context
Overall: L1 4.43 3 0.031
Shape: L1 2.53 2 0.080
Overall: Gender 33.71 3 < .001
Shape: Gender 5.67 2 0.003
/l/: /u/ context
Overall: L1 1.81 3 0.306
Shape: L1
Overall: Gender 29.91 3 < .001
Shape: Gender 0.00 2 1.000
TABLE VIII. Model comparisons for F3GAMMs for rhotics
Comparison χ2df p(χ2)
/ô/: /æ/ context
Overall: L1 17.36 3 < .001
Shape: L1 10.32 2 < .001
Overall: Gender 8.26 3 < .001
Shape: Gender 1.05 2 0.350
/ô/: /i/ context
Overall: L1 43.55 3 < .001
Shape: L1 26.89 2 < .001
Overall: Gender 22.40 3 < .001
Shape: Gender 3.21 2 0.041
/ô/: /u/ context
Overall: L1 27.42 3 < .001
Shape: L1 8.31 2 < .001
Overall: Gender 25.96 3 < .001
Shape: Gender 2.87 2 0.057
3. Effects of L2 proficiency on formant trajectories
Similarly to the static analysis, the effects of L1
Japanese speakers’ proficiency have been tested sepa-
FIG. 8. The F3 (Bark) trajectories predicted by GAMMs
over the liquid-vowel intervals for each liquid (rows) in each
vowel context (columns). Each panel shows predictions based
on the full model with a mean smooth and 95% confidence
interval for L1 English (blue) and L1 Japanese (red) speakers
and for female (solid) and male (dashed) speakers. (Colour
online)
rately from the main analysis. For each liquid-vowel
pairing, the models predicts Bark F2F1or Bark F3
with parametric terms of proficiency and gender, a time-
varying reference smooth, a time-varying by-proficiency
difference smooth and a time-varying by-gender differ-
ence smooth. The random effect is accounted for by time-
by-speaker and time-by-word random smooths. The vi-
sualisations are shown in Figs. 9and 10; please note that
the predictions shown in these figures are based on the
models excluding parametric and smooth terms associ-
ated with gender because the plots would be too crowded
to interpret otherwise.
The analyses for Bark F2F1suggest a statistically
significant effect of proficiency on the trajectory height
for /ô/ in the /u/ context (χ2(6) = 10.24, p= .002),
in which the F2F1trajectory for the advanced group
is lower than the beginner or intermediate groups. The
visualisation in Fig. 9, however, shows that the trajec-
tory shape is quite different between L1 English speakers
and the advanced L1 Japanese speakers. For Bark F3,
no statistically significant effects of proficiency are found
for either /l/ or /ô/ for the L1 Japanese speakers.
4. Summary: Dynamic analysis
The dynamic analysis shows substantial variability
in the liquid-vowel realisations between L1 English and
L1 Japanese speakers. Shape and height are significantly
different for the F2F1trajectories for both /l/ and /ô/,
with differences associated not only with the liquid por-
tion corresponding to the first third of the interval but
also with the transition patterns into the vowel. The F3
trajectories for /l/ are largely comparable between En-
glish and L1 Japanese speakers with little evidence of
statistically significant differences. The F3trajectories
for /ô/, on the other hand, differ substantially in the
J. Acoust. Soc. Am. / 14 December 2023 JASA 11
FIG. 9. The F2-F1 (Bark) trajectories illustrating differences
between the different proficiency groups among L1 Japanese
speakers predicted by GAMMs over the liquid-vowel inter-
vals for each liquid (rows) in each vowel context (columns).
Each panel shows predictions based on the model exclud-
ing parametric and smooth terms associated with gender for
simplicity, with a mean smooth and 95% confidence interval
for advanced (blue), intermediate (red), beginner (green) L1
Japanese speakers and L1 English speakers (orange). (Colour
online)
first half of the interval corresponding to the liquid por-
tion. The effects of gender are manifested almost exclu-
sively on the trajectory height, meaning a linear differ-
ence between trajectories for female and male speakers.
Although advanced L1 Japanese speakers produced the
lower F2F1trajectories in the /ô/-/u/ context than the
beginner and intermediate groups, the trend is quite dif-
ferent from that of L1 English speakers.
IV. DISCUSSION
A. Spectro-temporal variability in L2 English liquids
The current paper aims to capture time-varying
acoustic properties of English liquids produced by L1 En-
glish and L1 Japanese speakers. It combines two anal-
yses of F2F1and F3: the static analysis at the liquid
midpoint and the dynamic analysis over the liquid-vowel
interval. The liquid midpoint analysis suggests that L1
Japanese speakers constantly produce higher F2F1for
both English /l/ and /ô/ and higher F3for /ô/ than L1
English speakers across vowel contexts. The dynamic
analysis, on the other hand, shows that the between-L1
differences are non-linear, highlighting the complexity as-
sociated with the production of liquids and liquid-vowel
coarticulation.
Comparing the effects of speaker gender and L1
demonstrate the importance of dynamic information in
the liquid-vowel sequences. The static analysis shows
that male speakers generally produce English liquids with
lower F2F1and F3frequencies than female speakers,
and the gender difference is statistically significant for F3.
FIG. 10. The F3 (Bark) trajectories illustrating differences
between the different proficiency groups among L1 Japanese
speakers predicted by GAMMs over the liquid-vowel inter-
vals for each liquid (rows) in each vowel context (columns).
Each panel shows predictions based on the model exclud-
ing parametric and smooth terms associated with gender for
simplicity, with a mean smooth and 95% confidence interval
for advanced (blue), intermediate (red), beginner (green) L1
Japanese speakers and L1 English speakers (orange). (Colour
online)
The dynamic analysis further shows clearly that the spec-
tral difference between female and male speakers seems
to be linear; GAMMs model comparisons suggest statis-
tically significant differences in trajectory height but not
in trajectory shape, and it is quite clear from the visuali-
sations in Fig. 7and 8that the differences in trajectories
between female and male speakers are (almost) linear.
The dynamic difference associated with speaker L1,
on the other hand, draws a much more complicated
picture. While the time-varying analysis of F3for /l/
indicates little difference between L1 English and L1
Japanese speakers, the F3values for /ô/ show a clear
between-L1 difference in the first half of the interval, in-
dicating differences in acoustic realisations of liquids and
the transition into the vowel. Also, the trajectory shape
associated with L1 Japanese speakers’ /ô/ is flatter, re-
sulting in a smaller distinction between /ô/ and the vow-
els. The two language groups slightly differ in the point
in time at which F3achieves its maximum, such that
L1 Japanese speakers seem to achieve the vowel target
earlier than L1 English speakers do.
The F2F1trajectories further highlight the non-
linear between-L1 differences in the trajectory (Fig. 7).
In particular, L1 Japanese speakers show distinct trajec-
tory patterns across vowel contexts, suggesting that their
production of English liquids is subject to greater influ-
ence from the following vowels than that of L1 English
speakers. The liquid-/i/ trajectories, for example, sug-
gest that L1 Japanese speakers reach the vowel target
earlier given the early onset of the plateau than L1 En-
glish speakers despite a similar trajectory pattern. The
linear trend for the liquid-/u/ trajectories also indicates
12 J. Acoust. Soc. Am. / 14 December 2023 JASA
that L1 Japanese speakers do not clearly distinguish the
liquid and the vowel on F2F1.
The separate static analyses on the effects of L1
Japanese speakers’ English proficiency demonstrated
that advanced L1 Japanese-speaking learners of English
produced lower F2F1values for /ô/ than the other two
groups. Given that L1 English speakers produced lower
F2F1values for /ô/ at the liquid midpoint, the find-
ings supports the previous claims that English /ô/ is
easier for L1 Japanese speakers to learn than English
/l/ is (Aoyama et al.,2004), and that the use of F2
and F1may be easier for them to acquire than that of
F3(Saito and Munro,2014). The dynamic analysis in
the current study further demonstrates that advanced L1
Japanese speakers’ F2F1trajectory is statistically sig-
nificantly lower in the /ô/-/u/ context than the other two
groups. While this could be taken as evidence of the pro-
ficiency effects, the linear trend of the trajectories across
proficiency groups also suggests that even advanced L1
Japanese speakers do not seem to differentiate /ô/ and
/u/. Fundamentally, this lack of liquid-vowel differenti-
ation might demonstrate a general influence from their
L1 (i.e., Japanese). Further research is clearly needed to
investigate the effects of L2 proficiency on the formant
dynamics by employing more rigorous measures of L2
proficiency, especially given that acoustic profiles of L2
English liquids can be complex (Aoyama et al.,2019).
Overall, the dynamic analysis suggests that L1
Japanese speakers seem to differ not only in acoustic tar-
gets of English liquids, as captured in the static anal-
ysis, but also in the transition between the liquid and
the vowel. The results are in line with the previous find-
ings that the magnitude and timing of spectral changes
differ in the production of English liquids by L1 English-
speaking children (Howson and Redford,2021) and by
L2 learners of English (Espinal et al.,2020) from that
of adult L1 English speakers. These non-linear between-
language differences could point to some possible mecha-
nisms whereby L1 Japanese speakers struggle to produce
English liquids accurately in light of L2 speech learning.
B. Acquisition of English /l/ and /ô/ by L1 Japanese speakers
The overarching question in this study concerns how
L1 Japanese speakers differ from L1 English speakers in
dynamic acoustic realisations of word-initial English liq-
uids as a function of following vowels. The static anal-
ysis suggests that both speaker’s L1 and vowel context
influence the acoustic realisations of word-initial English
/l/ and /ô/. The L1 effect is unsurprising, given that
it largely agrees with previous findings that L1 Japanese
speakers produce both English /l/ and /ô/ with higher F2
and F3values than L1 English speakers (Aoyama et al.,
2019;Flege et al.,1995;Saito and van Poeteren,2018).
Regarding the vowel effect, the static analysis suggests
a general tendency that liquids in the /i/ context are
produced with higher F2F1values than in the /u/ con-
text, whereas the /æ/ context seems to facilitate the low-
est F2F1values for liquids. This could be explained in
light of previous findings that the F2values in English
liquids tend to be higher when preceding a high vowel
/i/ than a low vowel /a/ due to different articulatory de-
mands on the tongue dorsum configurations (Recasens,
2012).
The dynamic results demonstrate that L1 Japanese
speakers show different patterns of liquid-vowel coartic-
ulatory patterns depending on the following vowel com-
pared to L1 English speakers whose trajectory patterns
are consistent across the vowel contexts. The liquid-
/u/ trajectories, in particular, suggest that L1 Japanese
speakers make a less clear distinction between the liq-
uid and the vowel in the /u/ context. This could cor-
roborate previous perceptual findings that L1 Japanese
speakers are more likely to perceive a /w/-like percept
when perceiving English /l/ and /ô/, resulting in a con-
fusion between English /l ô/ and other categories (e.g.,
/w/ or [WR]) and therefore in less success in identifying
word-initial liquids in the back vowel context than in the
front vowel context (Best and Strange,1992;Guion et al.,
2000;Mochizuki,1981;Shimizu and Dantsuji,1983). The
data in this study demonstrate that such confusion aris-
ing from the vocalic component of English liquids in per-
ception could also be observed in L1 Japanese speakers’
production.
Generally, L1 Japanese speakers produce higher F3
for English /ô/ (Aoyama et al.,2019;Flege,1995;Saito
and Munro,2014). This is apparent in both static and
dynamic analyses; in particular, the dynamic analysis for
F3in Fig. 8shows that by-group difference largely lies
during the liquid portion, suggesting that the difference
in F3would be attributed to the liquid realisations. Pre-
vious research claims that F2is an easier acoustic cue for
L1 Japanese speakers to acquire (e.g., Saito and Munro,
2014;Saito and van Poeteren,2018). While variations in
F1could be negligible between the two speaker popula-
tions (e.g., Flege et al.,1995;Saito and Munro,2014),
this claim does not explain well why the F2F1tra-
jectories, which could derive from variations of F2, are
significantly different both in height and shape between
L1 Japanese and L1 English speakers (see Fig. 7). It
could therefore be argued that the static analysis only
captures a snapshot of acoustical realisations of English
liquids, when, in fact, L1 Japanese speakers differ from
L1 English speakers in the dynamic spectral characteris-
tics during the liquid-vowel interval.
In addition, an anonymous reviewer suggested a pos-
sibility that L1 Japanese speakers might use different dy-
namic strategies to make a contrast (e.g., through F2)
compared to L1 English speakers. It would, therefore, be
worthwhile to investigate how L1 Japanese speakers use
dynamic information to make such a phonological con-
trast, given especially that the Perceptual Assimilation
Model of L2 Speech Learning (PAM-L2) makes predic-
tions about how L2 speakers assimilate L2 phonological
contrasts into their L1 phonology (Best and Tyler,2007).
Theoretically, the Speech Learning Model (SLM)
posits that L2 learners store representations of the L2
sounds at the level of the position-sensitive allophones
J. Acoust. Soc. Am. / 14 December 2023 JASA 13
(Flege,1995;Flege and Bohn,2021), and previous stud-
ies show that L1 Japanese speakers’ perception of En-
glish /l/ and /ô/ is highly subject to the phonetic con-
text and the coarticulatory effects with neighbouring seg-
ments (Mochizuki,1981;Sheldon and Strange,1982).
Taken together, the current results demonstrate that L1
Japanese speakers are influenced by the phonetic details
of L2 English liquids not only in perception but also in
production; L1 Japanese speakers show different patterns
in the way they dissociate the liquid and vowel clearly,
especially in the /u/ context, manifested in their produc-
tion as different patterns of liquid-vowel coarticulation.
To summarise, the present study shows that the tem-
poral spectral changes during the liquid-vowel intervals
are significantly different between L1 English and L1
Japanese speakers along F2F1for both liquids and F3
for /ô/. The liquid-vowel trajectories of F2F1in the /i/
and /u/ contexts highlight particularly notable temporal
variability in the L1 Japanese speakers’ data, suggest-
ing that the liquid-vowel coarticulation could be consid-
ered as one of the production properties that L1 Japanese
speakers need to acquire in production of English liquids.
V. CONCLUSION
The present study examines the acoustics of L1
Japanese and L1 English speakers’ production of word-
initial English liquids. The key findings include that L1
Japanese speakers differ in the coarticulatory pattern be-
tween the liquid and vowel from L1 English speakers.
The dynamic analysis using GAMMs not only generally
agrees with the findings from the static analysis but also
highlights the robust yet complicated differences between
L1 and L2 speech in the formant dynamics. Overall, this
study illustrates that the dynamic characteristics are im-
portant aspects involved in production of English liquids
in the context of L2 speech learning. Directly studying
formant dynamics opens discussions around the specific
underlying mechanism of L2 speech production under the
influence of speakers’ L1, and future research will comple-
ment the current results using articulatory methods for
a better understanding of the factors that may underlie
differences in acoustic dynamics shown in this study.
DATA AVAILABILITY
Data and codes that support the findings of this
study are openly available on the Open Science Foun-
dation (OSF) repository at https://osf.io/2phx5/.
ACKNOWLEDGMENTS
I thank Dr Claire Nance and Dr Sam Kirkham for
their comments and support. Prof. Noriko Nakanishi,
Prof. Yuri Nishio, and Dr Bronwen Evans helped me
with data collection. The research is financially sup-
ported by Graduate Scholarship for Degree-Seeking Stu-
dents by Japan Student Services Organization and the
2022 Research Grant by the Murata Science Foundation.
AUTHOR DECLARATIONS
The author confirms no conflicts of interest. This
research is approved by ethics committees at Lancaster
University, Kobe Gakuin University and Meijo Univer-
sity. Informed consent was obtained from all partici-
pants.
Akamatsu, T. (1997). Japanese Phonetics: Theory and Practice
(Lincom Europa, unchen, Newcastle).
Aoyama, K., Flege, J. E., Akahane-Yamada, R., and Yamada, T.
(2019). “An acoustic analysis of American English liquids by
adults and children: Native English speakers and native Japanese
speakers of English,” The Journal of the Acoustical Society of
America 146(4), 2671–2681, doi: 10.1121/1.5130574.
Aoyama, K., Flege, J. E., Guion, S. G., Akahane-Yamada, R., and
Yamada, T. (2004). “Perceived phonetic dissimilarity and L2
speech learning: The case of Japanese /r/ and English /l/ and
/r/,” Journal of Phonetics 32(2), 233–250, doi: 10.1016/S009
5-4470(03)00036- 6.
Arai, T. (2013). “On Why Japanese /r/ Sounds are Difficult for
Children to Acquire,” in Proc. Interspeech 2013, ISCA, Lyon,
France, pp. 2445–2449, doi: 10.21437/Interspeech.2013-568.
Articulate Instruments (2022). “Articulate Assistant Advanced
version 220” Articulate Instruments.
Barreda, S. (2021). “Fast Track: Fast (nearly) automatic formant-
tracking using Praat,” Linguistics Vanguard 7(1), doi: 10.1515/
lingvan-2020- 0051.
Bates, D., achler, M., Bolker, B., and Walker, S. (2015). “Fitting
Linear Mixed-Effects Models Using lme4,” Journal of Statistical
Software 67, 1–48, doi: 10.18637/jss.v067.i01.
Beristain, A. M. (2022). “The acquisition of acoustic and aero-
dynamic patterns of coarticulation in second and heritage lan-
guages,” Ph.D. thesis, University of Illinois Urbana-Champaign.
Best, C. T., and Strange, W. (1992). “Effects of phonological and
phonetic factors on cross-language perception of approximants,”
Journal of Phonetics 20(3), 305–330, doi: 10.1016/S0095-447
0(19)30637-0.
Best, C. T., and Tyler, M. D. (2007). “Nonnative and second-
language speech perception: Commonalities and complementari-
ties,” in Language Experience in Second Language Speech Learn-
ing: In Honor of James Emil Flege, edited by O.-S. Bohn and
M. J. Munro (John Benjamins Publishing Company, Amster-
dam), pp. 13–34, doi: 10.1075/lllt.17.07bes.
Boersma, P., and Weenink, D. (2022). “Praat: Doing Phonetics
by Computer” .
Campbell, F., Gick, B., Wilson, I., and Vatikiotis-Bateson, E.
(2010). “Spatial and Temporal Properties of Gestures in North
American English /r/,” Language and Speech 53(1), 49–69, doi:
10.1177/0023830909351209.
Carter, P., and Local, J. (2007). “F2 variation in Newcastle and
Leeds English liquid systems,” Journal of the International Pho-
netic Association 37(2), 183–199, doi: 10.1017/S0025100307002
939.
Espinal, A., Thompson, A., and Kim, Y. (2020). “Acoustic char-
acteristics of American English liquids /ô/, /l/, /ôl/ produced
by Korean L2 adults,” The Journal of the Acoustical Society of
America 148(2), EL179–EL184, doi: 10.1121/10.0001758.
Espy-Wilson, C. Y. (1992). “Acoustic measures for linguistic fea-
tures distinguishing the semivowels /w j r l/ in American En-
glish,” The Journal of the Acoustical Society of America 92(2),
736–757, doi: 10.1121/1.403998.
Flege, J. E. (1995). “Second Language Speech Learning The-
ory, Findings and Problems,” in Speech Perception and Linguis-
tic Experience: Issues in Cross-Language Research, edited by
W. Strange (York Press), pp. 233–277.
Flege, J. E., and Bohn, O.-S. (2021). “The Revised Speech Learn-
ing Model (SLM-r),” in Second Language Speech Learning: The-
oretical and Empirical Progress, edited by R. Wayland, first ed.
(Cambridge University Press), pp. 3–83, doi: 10.1017/97811088
86901.002.
14 J. Acoust. Soc. Am. / 14 December 2023 JASA
Flege, J. E., Takagi, N., and Mann, V. (1995). “Japanese Adults
can Learn to Produce English /ô/ and /l/ Accurately,” Language
and Speech 38(1), 25–55, doi: 10.1177/002383099503800102.
Grosjean, F. (2008). “The bilingual’s language mode,” in Studying
Bilinguals, Oxford Linguistics (Oxford University Press, Oxford
; New York), pp. 36–66.
Guion, S. G., Flege, J. E., Akahane-Yamada, R., and Pruitt, J. C.
(2000). “An investigation of current models of second language
speech perception: The case of Japanese adults’ perception of
English consonants,” The Journal of the Acoustical Society of
America 107(5), 2711–2724, doi: 10.1121/1.428657.
Harrington, J. (2021). “emuR - Main package of the EMU
Speech Database Management System” Institute of Phonetics
and Speech Processing, University Munich.
Hattori, K., and Iverson, P. (2009). “English /r/-/l/ category as-
similation by Japanese adults: Individual differences and the link
to identification accuracy,” The Journal of the Acoustical Society
of America 125(1), 469–479, doi: 10.1121/1.3021295.
Howson, P. J., and Redford, M. A. (2021). “The Acquisition of
Articulatory Timing for Liquids: Evidence From Child and Adult
Speech,” Journal of Speech, Language, and Hearing Research
64(3), 734–753, doi: 10.1044/2020_JSLHR-20- 00391.
Iverson, P., Kuhl, P. K., Akahane-Yamada, R., Diesch, E.,
Tohkura, Y., Kettermann, A., and Siebert, C. (2003). “A percep-
tual interference account of acquisition difficulties for non-native
phonemes,” Cognition 87(1), B47–B57, doi: 10.1016/S0010-027
7(02)00198-1.
Keating, P. A. (1985). “Universal Phonetics and the Organization
of Grammars,” in Phonetic Linguistics: Essays in Honor of Pe-
ter Ladefoged (Academic Press), pp. 115–132.
King, H., and Ferragne, E. (2020). “Loose lips and tongue tips:
The central role of the /r/-typical labial gesture in Anglo-
English,” Journal of Phonetics 80, 100978, doi: 10. 10 16/ j.
wocn.2020.100978.
Kirkham, S. (2017). “Ethnicity and phonetic variation in Sheffield
English liquids,” Journal of the International Phonetic Associa-
tion 47(1), 17–35, doi: 10.1017/S0025100316000268.
Kirkham, S., Nance, C., Littlewood, B., Lightfoot, K., and
Groarke, E. (2019). “Dialect variation in formant dynamics: The
acoustics of lateral and vowel sequences in Manchester and Liver-
pool English,” The Journal of the Acoustical Society of America
145(2), 784–794, doi: 10.1121/1.5089886.
Ladefoged, P., and Johnson, K. (2010). A Course in Phonetics,
International Edition, 6th edition ed. (Wadworth, Boston, MA).
Lawson, E., Stuart-Smith, J., Scobbie, J. M., Yaeger-Dror, M., and
Maclagan, M. (2011). “Liquids,” in Sociophonetics: A Student’s
Guide, edited by M. Di Paolo and M. Yaeger-Dror (Routledge),
pp. 72–86.
McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., and Son-
deregger, M. (2017). “Montreal Forced Aligner: Trainable Text-
Speech Alignment Using Kaldi,” in Interspeech 2017, ISCA, pp.
498–502, doi: 10.21437/Interspeech.2017-1386.
Mochizuki, M. (1981). “The identification of /r/ and /l/ in natural
and synthesized speech,” Journal of Phonetics 9(3), 283–303, doi:
10.1016/S0095-4470(19)30972- 6.
Morimoto, M. (2020). “Geminated Liquids in Japanese: A Produc-
tion Study,” Ph.D. thesis, University of California Santa Cruz.
Proctor, M. (2011). “Towards a gestural characterization of liq-
uids: Evidence from Spanish and Russian,” Laboratory Phonol-
ogy 2(2), 451–485, doi: 10.1515/labphon.2011.017.
Proctor, M., Walker, R., Smith, C., Szalay, T., Goldstein, L., and
Narayanan, S. (2019). “Articulatory characterization of English
liquid-final rimes,” Journal of Phonetics 77, 100921, doi: 10.101
6/j.wocn.2019.100921.
R Core Team (2022). “R: A Language and Environment for Sta-
tistical Computing” R Foundation for Statistical Computing.
Recasens, D. (1991). “On the production characteristics of api-
coalveolar taps and trills,” Journal of Phonetics 19(3-4), 267–
280, doi: 10.1016/S0095-4470(19)30344- 4.
Recasens, D. (2012). “A cross-language acoustic study of initial
and final allophones of /l/,” Speech Communication 54(3), 368–
383, doi: 10.1016/j.specom.2011.10.001.
Riney, T. J., Takada, M., and Ota, M. (2000). “Segmentals and
Global Foreign Accent: The Japanese Flap in EFL,” TESOL
Quarterly 34(4), 711–737, doi: 10.2307/3587782.
Saito, K., and Munro, M. J. (2014). “The Early Phase of /ô/
Production Development in Adult Japanese Learners of English,”
Language and Speech 57(4), 451–469, doi: 10.1177/0023830913
513206.
Saito, K., and van Poeteren, K. (2018). “The percep-
tion–production link revisited: The case of Japanese learners’
English /ô/ performance,” International Journal of Applied Lin-
guistics 28(1), 3–17, doi: 10.1111/ijal.12175.
Setter, J., and Jenkins, J. (2005). “Pronunciation,” Language
Teaching 38(1), 1–17, doi: 10.1017/S026144480500251X.
Sheldon, A., and Strange, W. (1982). “The acquisition of /r/ and
/l/ by Japanese learners of English: Evidence that speech pro-
duction can precede speech perception,” Applied Psycholinguis-
tics 3(3), 243–261, doi: 10.1017/S0142716400001417.
Shimizu, K., and Dantsuji, M. (1983). “A Study on the Perception
of /r/ and /l/ in Natural and Synthetic Speech Sounds,” Studia
phonologica 17, 1–14, doi: 10.1016/S0095-4470(19)30972- 6.
oskuthy, M. (2017). “Generalised additive mixed models for dy-
namic analysis in linguistics: A practical introduction,” doi:
10.48550/arXiv.1703.05339.
oskuthy, M., Foulkes, P., Hughes, V., and Haddican, B. (2018).
“Changing Words and Sounds: The Roles of Different Cognitive
Units in Sound Change,” Topics in Cognitive Science 10(4), 787–
802, doi: 10.1111/tops.12346.
Sproat, R., and Fujimura, O. (1993). “Allophonic variation in En-
glish /l/ and its implications for phonetic implementation,” Jour-
nal of Phonetics 21(3), 291–311, doi: 10.1016/S0095-4470(19)3
1340-3.
Stevens, K. N. (2000). Acoustic Phonetics (The MIT Press).
van Rij, J., Wieling, M., Baayen, R. H., and van Rijn, H. (2020).
“Itsadug: Interpreting Time Series and Autocorrelated Data Us-
ing GAMMs” .
Wells, J. C. (2008). Longman Pronunciation Dictionary, third ed.
(Pearson Education Ltd.).
West, P. (1999a). “The extent of coarticulation of English liquids:
An acoustic and articulatory study,” in Proceedings of the 14th
International Congress of Phonetic Sciences (ICPhS-14), edited
by J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, and A. C.
Bailey, San Francisco, CA, USA, pp. 1901–1904.
West, P. (1999b). “Perception of distributed coarticulatory proper-
ties of English /l/ and /r/,” Journal of Phonetics 27(4), 405–426,
doi: 10.1006/jpho.1999.0102.
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D.,
Fran¸cois, R., Grolemund, G., Hayes, A., Henry, L., Hester, J.,
Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., uller,
K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi,
K., Vaughan, D., Wilke, C., Woo, K., and Yutani, H. (2019).
“Welcome to the Tidyverse,” Journal of Open Source Software
4(43), 1686, doi: 10.21105/joss.01686.
Wieling, M. (2018). “Analyzing dynamic phonetic data using gen-
eralized additive mixed modeling: A tutorial focusing on articula-
tory differences between L1 and L2 speakers of English,” Journal
of Phonetics 70, 86–116, doi: 10.1016/j.wocn.2018.03.002.
Winter, B. (2020). Statistics for Linguists: An Introduction Using
R(Routledge).
Wood, S. N. (2017). Generalized Additive Models: An Introduction
with R, second ed. (Chapman and Hall/CRC, New York).
Yamada, R. A., and Tohkura, Y. (1992). “The effects of experi-
mental variables on the perception of American English /r/ and
/l/ by Japanese listeners,” Perception & Psychophysics 52(4),
376–392, doi: 10.3758/BF03206698.
Yamane, N., Howson, P., and Po-Chun (Grace), W. (2015). “An
ultrasound examination of taps in Japanese,” in Proceedings of
the 18th International Congress of Phonetic Sciences.
Ying, J., Shaw, J. A., Kroos, C., and Best, C. T. (2012). “Rela-
tions Between Acoustic and Articulatory Measurements of /l/,”
in Proceedings of the 14th Australasian International Conference
on Speech Science and Technology, Sydney, pp. 109–112.
Zimmermann, G. N., Price, P., and Ayusawa, T. (1984). “The
production of English /r/ and /l/ by two Japanese speakers dif-
fering in experience with English,” Journal of Phonetics 12(3),
187–193, doi: 10.1016/S0095-4470(19)30873- 3.
J. Acoust. Soc. Am. / 14 December 2023 JASA 15
... This study provides insights into how informal and formal registers intermingle in user-generated content, reflecting the hybrid nature of online communication. Similarly, Nagamine (2024) explored the phonetic aspects of second language speech, focusing on Japanese speakers' production of English liquids, which sheds light on the phonological challenges and adaptations in language learning contexts. ...
Article
The purpose of the research are (1) to identify the types of registers and (2) to describe the language functions used by the marketing community in Akulaku marketplace apps. This research uses qualitative methods to analyze data and use descriptive methods to explain the data. The object of this research is the written register terms found in Akulaku Marketplace apps. The theories used to classify the data are according to Halliday and Hasan's (1989) theory of register types and Jakobson's (1960) theory of language functions. According to this research, there are 90 data analyzed. The data is categorized into two types of registers, they are closed register 41 data (45,5%) and open register 49 data (54,5%). The functions of the register are referential function 14 data (15,5%), emotive function 20 data (22,2%), conative function 27 data (30%), phatic function 11 data (12,2%), poetic function 11 data or (12.2%) and metalingual function 7 data or (7,7%). This study is expected to increase knowledge of the types of registers and their functions used in marketplace buying and selling activities, while at the same time serving as a valuable resource for reference in education and, in addition, studies in linguistics, mainly in the subject of sociolinguistics.
Article
This study investigated how 40 Chinese learners of English as a foreign language (EFL learners) differed from 40 native English speakers in the production of four English tense-lax contrasts, /i-ɪ/, /u-ʊ/, /ɑ-ʌ/, and /æ-ε/, by examining the acoustic measurements of duration, the first three formant frequencies, and the slope of the first formant movement (F1 slope). The dynamic formant trajectory was modeled using discrete cosine transform coefficients to demonstrate the time-varying properties of formant trajectories. A discriminant analysis was employed to illustrate the extent to which Chinese EFL learners relied on different acoustic parameters. This study found that: (1) Chinese EFL learners overemphasized durational differences and weakened spectral differences for the /i-ɪ/, /u-ʊ/, and /ɑ-ʌ/ pairs, although they maintained sufficient spectral differences for /æ-ε/. In contrast, native English speakers predominantly used spectral differences across all four pairs; (2) in non-low tense-lax contrasts, unlike native English speakers, Chinese EFL learners failed to exhibit different F1 slope values, indicating a non-nativelike tongue-root placement during the articulatory process. The findings underscore the contribution of dynamic spectral patterns to the differentiation between English tense and lax vowels, and reveal the influence of precise articulatory gestures on the realization of the tense-lax contrast.
Thesis
Full-text available
This dissertation investigates language-specific acoustic and aerodynamic phenomena in language contact situations. Whereas most work on second language and bilingual phonology has focused on individual consonants and vowels, this project examines patterns of coarticulation in the two languages of Spanish-English and French-English bilingual speakers. These include speakers whose first language is either Spanish, French or English, and are late second-language acquirers, and heritage speakers of Spanish, who are early second-language acquirers. I focus on subtly different coarticulation patterns between English and Spanish, including the extent to which vowels are nasalized in contact with nasal consonants (Chapter 2), are lengthened before voiced consonants (Chapter 3), and whose quality is affected before voiced consonants (Chapter 4). Whereas the existence of such effects can be taken as universal, the degree to which they are implemented varies from language to language, presumably contributing to what defines a ‘native accent.’ My work thus presents a novel method to investigate coarticulatory patterns. The theoretical question that I address in my dissertation is whether bilingual speakers can establish distinct coarticulatory patterns in their two languages in ways that are similar to those of monolinguals of the two languages. A related question is to what extent learning both languages in childhood (as in the case of heritage speakers) facilitates separating the two phonetic systems. In Chapter 2, I study coarticulatory vowel nasalization in Spanish and English using pressure transducers and Generalized Additive Mixed Models to observe how nasal airflow changes over time. In Chapter 3, I focus on vowel length as a cue for voicing of the following consonant in two Romance languages (Spanish, French) and English, which show opposite patterns. Chapter 4 is about vowel formant displacement patterns across time and the effect of vocalic length in Spanish and English. In Chapter 5, I present a new phonological model, “The Bilingual Coarticulatory Model”, which describes coarticulation as malleable and adjustable cross-linguistically in bilingual speakers that possess a higher level of linguistic proficiency. Results show that properties pertaining to vowel quality are easier to acquire than durational properties, which would go against some of the L2 literature on the acquisition of vowels. Native speakers of Spanish show native-like nasalization values in L2 English, yet only when the syllabic structure of sequences is shared. Heritage speakers show native-like results in both languages with regard to nasalization, and L1En speakers show an adjustment of onset of nasalization but not of degree of nasalization. Regarding duration, heritage speakers were the only group to completely separate the two coarticulatory systems, as the other groups showed cross-linguistic influence. Finally, regarding the dynamics of vowel formants, speakers transfer L1 patterns to the L2. Linguistic proficiency in the L2 was a significant factor to acquire coarticulatory patterns. In the case of heritage speakers, different findings were found depending on the variable under study.
Article
Full-text available
Fast Track is a formant tracker implemented in Praat that attempts to automatically select the best analysis from a set of candidates. The best track is selected by modeling smooth formant contours across the entirety of the sound, providing the researcher with rich information about static and dynamic formant properties. Fast Track returns text files containing acoustic information (formant frequencies, formant bandwidths, fundamental frequency, etc.) sampled every 2 ms, generates images showing the winning analysis and comparing alternate analyses, and creates log files detailing analysis information for each file. Fast Track features a modular workflow that allows for analysis steps to be run (and re-run) independently as necessary, and is designed to allow for easy correction of tracking errors by allowing the user to override the automatic analysis, or manually edit tracks where necessary. In addition, Fast Track includes tools to aggregate data across tokens, and to easily create vowel plots of mean values or time-varying formant contours. The design and use of Fast Track are outlined using a re-analysis of the Hillenbrand et al. (1995) dataset, which suggests that Fast Track can be very accurate in cases where signal properties allow for reliable formant estimates.
Chapter
Full-text available
Here we present the revised Speech learning model (SLM-r), an individual differences model which aims to account for how phonetic systems reorganize over the life span in response to the phonetic input received during naturalistic second language (L2) learning. The SLM-r proposes that the mechanisms and processes needed for native language (L1) acquisition remain accessible for use in L2 learning across the life span. By hypothesis, the formation or non-formation of new phonetic categories for L2 sounds will depend on the precision of L1 categories at the time L2 learning begins, the perceived phonetic dissimilarity of an L2 sound from the closest L1 sound, and the quantity and quality of L2 input that has been received. According to the SLM-r, the phonetic categories making up the L1 and L2 phonetic subsystems interact with one another dynamically and are updated whenever the statistical properties of the input distributions defining L1, L2, and composite L1-L2 categories (diaphones) change.
Article
Full-text available
This paper presents acoustic and articulatory data from prevocalic /r/ in the non-rhotic variety of English spoken in England, Anglo-English. Although traditional descriptions suggest that Anglo-English /r/ is produced using a tip-up tongue configuration, ultrasound data from 24 speakers show similar patterns of lingual variation to those reported in rhotic varieties, with a continuum of possible tongue shapes from bunched to retroflex. However, the number of Anglo-English speakers using exclusively tip-up variants is higher than that reported in American English across all phonetic contexts. It is generally agreed that English /r/ may be labialised, but the exact contribution of the lips has yet to be explored. Lip camera data reveal significantly more lip protrusion in bunched tongue configurations than retroflex ones. These results indicate that the differing degrees of lip protrusion may contribute to maintaining a stable acoustic output across the different tongue shapes. An articulatory-acoustic trading relation between the sublingual space and the degree of lip protrusion is proposed. Finally, we suggest that Anglo-English /r/ has a specific lip posture which differs from that of /w/. We relate the development of such a posture to Anglo-English speakers' exposure to labiodental variants and to the pressure to maintain a perceptual contrast between /r/ and /w/.
Article
Purpose Liquids are among the last sounds to be acquired by English-speaking children. The current study considers their acquisition from an articulatory timing perspective by investigating anticipatory posturing for /l/ versus /ɹ/ in child and adult speech. Method In Experiment 1, twelve 5-year-old, twelve 8-year-old, and 11 college-aged speakers produced carrier phrases with penultimate stress on monosyllabic words that had /l/, /ɹ/, or /d/ (control) as singleton onsets and /æ/ or /u/ as the vowel. Short-domain anticipatory effects were acoustically investigated based on schwa formant values extracted from the preceding determiner (= the ) and dynamic formant values across the /ə#LV/ sequence. In Experiment 2, long-domain effects were perceptually indexed using a previously validated forward-gated audiovisual speech prediction task. Results Experiment 1 results indicated that all speakers distinguished /l/ from /ɹ/ along F3. Adults distinguished /l/ from /ɹ/ with a lower F2. Older children produced subtler versions of the adult pattern; their anticipatory posturing was also more influenced by the following vowel. Younger children did not distinguish /l/ from /ɹ/ along F2, but both liquids were distinguished from /d/ in the domains investigated. Experiment 2 results indicated that /ɹ/ was identified earlier than /l/ in gated adult speech; both liquids were identified equally early in 5-year-olds' speech. Conclusions The results are interpreted to suggest a pattern of early tongue–body retraction for liquids in /ə#LV/ sequences in children's speech. More generally, it is suggested that children must learn to inhibit the influence of vowels on liquid articulation to achieve an adultlike contrast between /l/ and /ɹ/ in running speech.
Article
This study examined the acoustic characteristics of American English liquids /ɹ/, /l/, and /ɹl/ produced by 14 adult learners of English (L2) and 13 native speakers of English. Several temporal and spectral measures were examined, including a novel measure to describe the relative timing of the maximum constriction during liquid production. The results indicated that L2 speakers rely more on duration contrasts to distinguish the three liquids than spectral contrasts. Reduced spectral differences among the liquids in L2 speakers are discussed concerning the influence of the native language of L2 speakers.