PresentationPDF Available

Assessing phonological control of parasagittal tongue shape in Japanese sibilants

Authors:
Assessing phonological control
of parasagittal tongue shape in
Japanese sibilants
Michael C. Sterna, Jason A. Shawa, Shigeto Kawaharab
aYale University
bKeio University
1/7/22 96th Annual Meeting of the LSA 1
“Phonological control”
Abstract phonological primitives (e.g., features or gestures)
correspond in some way to physical dimensions, whether
articulatory, auditory/acoustic, or a combination thereof
Other dimensions—those not under phonological control—may
passively covary with controlled dimensions
For example: f0during a vowel can be raised as a passive
consequence of actively closing the vocal folds for [ʔ] (e.g., Hombert et
al., 1979)
1/7/22 96th Annual Meeting of the LSA 2
This talk
Which dimensions are under phonological control during
Japanese sibilant production?
Japanese has two sibilants which are similar to English sibilants
Japanese: anterior [s] vs. posterior [ɕ]; English: anterior [s] vs.
posterior [ʃ]
Precise phonetic difference between Japanese [ɕ] and English
[ʃ] is somewhat unclear
There are also by-language differences in phonological
patterning and acquisition error patterns
1/7/22 96th Annual Meeting of the LSA 3
English [s] and [ʃ] differ in parasagittal tongue
shape: Ultrasound data
1/7/22 96th Annual Meeting of the LSA 4
Stone & Lundberg (1996)
[s] [ʃ]
Whalen et al. (2011)
Deeper and narrower groove for [s] than [ʃ]
English sibilants: Electropalatography (EPG)
More alveolar contact
for [s] (right) than [ʃ]
(left)
Also more post-
alveolar contact for
[s] than [ʃ]
Consistent with
deeper and narrower
groove for [s] than [ʃ]
1/7/22 96th Annual Meeting of the LSA 5
Pouplier et al. (2011)
[ʃ] [s]
Japanese sibilants: EPG data
1/7/22 96th Annual Meeting of the LSA 6
Like English, more
alveolar contact for
[s] than [ɕ]
Unlike English, less
post-alveolar contact
for [s] than [ɕ]
By-language
difference in
parasagittal control?
[s]
[ɕ]
Matsui (2017)
Differences in phonological patterning
The English sibilant contrast is less susceptible to influence from
surrounding vowels:
English sibilants contrast before all vowels (but neutralized
before certain clusters, e.g. [strit] ~ [ʃtrit])
The Japanese sibilant contrast is more limited
Complementary distribution in native words (Yamato lexical stratum):
[ɕ] occurs before [i], and [s] occurs before all other vowels (/si/ → [ɕi])
In Sino-Japanese and recent loans, [s] and [ɕ] contrast before non-front
vowels ([ɕa, ɕu, ɕo]), but rarely before [i]
Perhaps this is due to a difference in phonological control
1/7/22 96th Annual Meeting of the LSA 7
Differences in acquisition error patterns
Li et al. (2009):
English-learning children tend to replace /ʃ/ with [s]
Japanese-learning children tend to replace /s/ with [ɕ]
Perhaps English-learning and Japanese-learning children learn
different dimensions of phonological control
1/7/22 96th Annual Meeting of the LSA 8
Hypothesis
These phonetic, phonological, and acquisition facts follow from a
by-language difference in phonological control:
English sibilant production involves active parasagittal control
[s] = deep, narrow groove
[ʃ] = wider groove or doming
Japanese sibilant production does not
[s] ~ [ɕ] contrast is maintained by midsagittal constriction
location
1/7/22 96th Annual Meeting of the LSA 9
This study
Investigate parasagittal control during Japanese sibilant production
using 3D Electromagnetic Articulography (EMA)
Participants: Three adult native Japanese speakers
S01: Male, 30s, Tokyo
S02: Male, 30s, Osaka
S03: Female, 30s, Tokyo
1/7/22 96th Annual Meeting of the LSA 10
Materials & Procedure
24 real Japanese words beginning with either [s] or [ɕ] followed by
either [u] or [i]
Carrier phrase: okee ___ to itte okay say ___ again’
Stimuli were presented on screen in Japanese orthography
Each item was presented in random order within a block (15 blocks)
Total number of tokens included in analysis = 942
1/7/22 96th Annual Meeting of the LSA 11
Data collection
1/7/22 96th Annual Meeting of the LSA 12
NDI Wave EMA system
sampling at 100 Hz
Lingual sensors:
Tongue dorsum (TD)
Tongue blade (TB)
Tongue tip (TT)
Parasagittal tongue left (PTL)
Parasagittal tongue right (PTR)
Primary parasagittal measure
Angle under the tongue (γ) in degrees
calculated using the law of cosines
(Howson et al., 2015)
1/7/22 96th Annual Meeting of the LSA 13
γ = arccos((LB2+ RB2LR2) / (2 * LB * RB)) * (180 / π)
LB is the Euclidean distance between TB and PTL; RB is the Euclidean distance
between TB and PTR; and LR is the Euclidean distance between PTL and PTR
Qualitative results
(1) [ɕ] tends to be more
domed than [s]
(2) [ɕ] tends to have a
higher TB than [s]
(3) [ɕ] tends to have a
lower TT than [s]
1/7/22 96th Annual Meeting of the LSA 14
[ɕ]
[s]
Hypotheses from qualitative patterns
[ɕ] is articulated primarily with the TB (TB under phonological control)
[s] is articulated primarily with the TT (TT under phonological control)
Domed shape of [ɕ] is a passive consequence of raising the TB
Prediction: Negative relationship between TB height and angle
under the tongue (γ), regardless of segment
1/7/22 96th Annual Meeting of the LSA 15
Quantitative analysis
Acoustic data was force aligned using WebMAUS (Kisler et al., 2017)
Gamma (doming) and TB height were calculated at the temporal
midpoint of each sibilant token
Examine average TB height and gamma by segment, as well as
relationship between TB height and gamma
Linear mixed effects models
1/7/22 96th Annual Meeting of the LSA 16
TB height by segment
1/7/22 96th Annual Meeting of the LSA 17
Angle under the tongue (gamma) by segment
1/7/22 96th Annual Meeting of the LSA 18
More domed Less domed
Relationship between TB height and gamma
1/7/22 96th Annual Meeting of the LSA 19
More domed Less domed
Trimmed dataset (> 1.5 SD below the mean)
1/7/22 96th Annual Meeting of the LSA 20
More domed Less domed
Interim summary
[ɕ] has higher TB than [s]
[ɕ] is more domed than [s]
For both segments, higher TB = more doming
Can the by-segment difference in doming be entirely explained
by the by-segment difference in TB height?
Or is segment type independently predictive of doming?
1/7/22 96th Annual Meeting of the LSA 21
Linear mixed effects models
To test this, we fit nested linear mixed effects models to gamma
Both gamma and TB height were z-scored
Segment identity was sum-coded: [s] = 1, [ɕ] = -1
1/7/22 96th Annual Meeting of the LSA 22
Model structure
Baseline = gamma ~
(TB_height + segment | subject) + (TB_height | item)
+ TB_height = gamma ~ TB_height +
(TB_height + segment | subject) + (TB_height | item)
+ segment = gamma ~ TB_height + segment +
(TB_height + segment | subject) + (TB_height | item)
+ interaction = gamma ~ TB_height * segment +
(TB_height + segment | subject) + (TB_height | item)
1/7/22 96th Annual Meeting of the LSA 23
Model comparison
1/7/22 96th Annual Meeting of the LSA 24
npar
AIC
BIC
deviance
Chi
-Sq
df
p value
baseline
11
548.57
600.84
526.57
+
TB_height
12
535.66
592.69
511.66
14.907
1
< .001
+ segment
13
535.81
597.59
509.81
1.849
1
0.174
+ interaction
14
535.85
602.38
507.85
1.962
1
0.161
Summary of best-fitting model: fixed effects
1/7/22 96th Annual Meeting of the LSA 25
Estimate
df
t value
(Intercept)
0.254
2.494
4.567
TB_height
-1.282
2.171
-17.764
By-subject random effects
1/7/22 96th Annual Meeting of the LSA 26
Estimates
Subject
(Intercept)
TB_height
segment=s
S01
0.110
-1.082
0.051
S02
0.146
-1.132
0.145
S03
0.329
-1.386
-0.016
All subjects show strong consistent effects of TB height
S01 and S02 show a small effect of segment type in the expected
direction, but S03 shows a small effect in the opposite direction
Summary of model results
Most variance in gamma is explained by TB height
Effect of segment type on gamma not significant across subjects
1/7/22 96th Annual Meeting of the LSA 27
Discussion
Results consistent with the hypothesis that Japanese sibilants are
produced without active parasagittal control
Rather, parasagittal tongue shape during Japanese sibilant
production may be a passive consequence of TB height control
Consistent with cross-linguistic variation in dimensions of
phonological control, even for very similar sounds
1/7/22 96th Annual Meeting of the LSA 28
Next step: English EMA data
1/7/22 96th Annual Meeting of the LSA 29
English is predicted to show a different relationship between
segment identity, TB height, and parasagittal tongue shape
If English sibilants involve active parasagittal control, we would
expect a stronger, more consistent effect of segment identity on
parasagittal tongue shape
Preview: [z] grooving in ‘Wednesday
1/7/22 96th Annual Meeting of the LSA 30
(data from Ji et al., 2014)
Implications
Is parasagittal control related to phonotactics?
e.g., deeper grooving sustains sibilant in English consonant
clusters?
Does English tense/lax distinction involve parasagittal control?
(Stone & Lundberg, 1996)
Does relative lack of parasagittal control in Japanese underlie
difficulty of Japanese speakers learning English rhotic ~ lateral
contrast, which likely involves parasagittal control? (Ying et al., 2021)
1/7/22 96th Annual Meeting of the LSA 31
Other next steps
More Japanese data (different speakers, different EMA sensor
arrangements)
How robust is this pattern in Japanese?
Biomechanical modeling in Artisynth (Stavness et al, 2014)
What underlies the relationship between TB height and
parasagittal tongue shape?
Investigate the nonlinearity: why the qualitatively different
pattern at lower TB heights?
1/7/22 96th Annual Meeting of the LSA 32
Thank you!
To the experiment participants
To th e Yal e Phonologroup
To AMP 2021 part ic ipants
1/7/22 96th Annual Meeting of the LSA 33
[ɕ]
[s]
References
Hombert, J.-M., Ohala, J. J., & Ewan, W. G. (1979). Phonetic explanations for the development of tones. Language, 55(1), 37–58.
Howson, P., Kochetov, A., & van Lieshout, P. (2015). Examination of the grooving patterns of the Czech trill-fricative. Journal of Phonetics, 49,
117129.
Ji, A., Berry, J. J., & Johnson, M. T. (2014). The electromagnetic articulography Mandarin accented English (EMA-MAE) corpus of acoustic and
3D articulatory kinematic data. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 7719–7723.
Kisler, T., Reichel, U. D., & Schiel, F. (2017). Multilingual processing of speech via web services. Computer Speech & Language, 45, 326–347.
Li, F., Edwards, J., & Beckman, M. E. (2009). Contrast and covert contrast: The phonetic development of voiceless sibilant fricatives in English
and Japanese toddlers. Journal of Phonetics, 37(1), 111–124.
Matsui, M. F. (2017). On the Input Information of the C/D Model for Vowel Devoicing in Japanese. Journal of the Phonetic Society of Japan,
21(1), 127–140.
Pouplier, M., Hoole, P., & Scobbie, J. M. (2011). Investigating the asymmetry of English sibilant assimilation: Acoustic and EPG data. Laboratory
Phonology, 2(1), 1–33.
Stavness, I., Nazari, M. A., Flynn, C., Perrier, P., Payan, Y., Lloyd, J. E., & Fels, S. (2014). Coupled Biomechanical Modeling of the Face, Jaw, Skull,
Tongue, and Hyoid Bone. In N. Magnenat-Thalmann, O. Ratib, & H. F. Choi (Eds.), 3D Multiscale Physiological Human (pp. 253274). London:
Springer London.
Stone, M., & Lundberg, A. (1996). Three-dimensional tongue surface shapes of English consonants and vowels. Journal of the Acoustical Society
of America, 99(6), 3728–3737.
Whalen, D. H., Shaw, P., Noiray, A., & Antony, R. (2011). Analogs of Tahltan Consonant Harmony in English CVC Syllables. International Congress
of Phonetic Sciences (ICPhS), 2129–2132.
Ying, J., Shaw, J. A., Carignan, C., Proctor, M., Derrick, D., & Best, C. T. (2021). Evidence for active control of tongue lateralization in Australian
English /l/. Journal of Phonetics, 86.
1/7/22 96th Annual Meeting of the LSA 34
Stimuli
弛緩性
ɕikansei
志願制
ɕigansei
しこり
ɕikori
仕事
ɕigoto
死闘
ɕitoo
指導
ɕidoo
死体
ɕitai
私大
ɕidai
主観
ɕukan
主眼
ɕugan
主体性
ɕutaisei
主題歌
ɕudaika
趣向
ɕukoo
酒豪
ɕugoo
酒盗
ɕutoo
手動
ɕudoo
ストライク
sutoraiku
すどく
sudoku
スタンプ
sutampu
すだち
sudachi
少し
sukoɕi
菅野
sugano
スカイ
sukai
すごい
sugoi
1/7/22 96th Annual Meeting of the LSA 35
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Research on the temporal dynamics of /l/ production has focused primarily on mid-sagittal tongue movements. This study reports how known variations in the timing of mid-sagittal gestures are related to para-sagittal dynamics in /l/ formation in Australian English (AusE), using three-dimensional electromagnetic articulography (3D EMA). The articulatory analyses show (1) consistent with past work, the temporal lag between tongue tip and tongue body gestures identified in the mid-sagittal plane changes across different syllable positions and vowel contexts; (2) the lateral channel is largely formed by tilting the tongue to the left/right side of the oral cavity as opposed to curving the tongue within the coronal plane; and, (3) the timing of lateral channel formation relative to the tongue body gesture is consistent across syllable positions and vowel contexts, even as the temporal lag between tongue tip and tongue body gestures varies. This last result is particularly informative with respect to theoretical hypotheses regarding gestural control for /l/s, as it suggests that lateral channel formation is actively controlled as opposed to resulting as a passive consequence of tongue stretching. These results are interpreted as evidence that the formation of the lateral channel is a primary articulatory goal of /l/ production in AusE.
Chapter
Full-text available
The tissue scale is an important spatial scale for modeling the human body. Tissue-scale biomechanical simulations can be used to estimate the internal muscle stresses and bone strains during human movement, as well as the distribution of force in muscles with complex internal architecture and broad insertion areas. Tissue-scale simulations are of particular interest for muscle structures where the changes in the shape of the structure are functionally important, such as the face, tongue, and vocal tract. Biomechanical modeling of these structures has potential to improve our understanding of orofacial physiology in respiration, mastication, deglutition, and speech production. Biomechanical simulations of the face and vocal tract pose a challenging engineering problem due to the tight coupling of tissue dynamics between numerous structures: the face, lips, jaw, skull, tongue, hyoid bone, soft palate, pharynx, and larynx. In this chapter, we describe our efforts to develop novel tissue-scale modeling and simulation techniques targeted to orofacial anatomy. We will also review our efforts to apply such simulations to reveal the biomechanics underlying orofacial movements.
Article
Full-text available
We present tongue-palate contact (EPG) and acoustic data on English sibilant assimilation, with a particular focus on the asymmetry arising from the order of the sibilants. It is generally known that /s#ʃ  / sequences may display varying de-grees of regressive assimilation in fluent speech, yet for /  ʃ  #s/ it is widely assumed that no assimilation takes place, although the empirical content of this assumption has rarely been investigated nor a clear theoretical explanation proposed. We systemat ically compare the two sibilant orders in word-boundary clusters. Our data show that /s#ʃ  / sequences assimilate frequently and this assimilation is strictly regressive. The assimilated sequence may be indistinguishable from a homorganic control sequence by our measures, or it can be characterized by measurement values intermediate to those typical for /  ʃ  / or /s/. /  ʃ  #s/ sequences may also show regressive assimilation, albeit less frequently and to a lesser degree. Assimilated /  ʃ  #s/ sequences are always distinguishable from /s#s/ sequences. In a few cases, we identify progressive assimilation for /  ʃ  #s/. We discuss how to account for the differences in degree of assimilation, and we propose that the order asymmetry may arise from the different articulatory control structures employed for the two sibilants in conjunction with phonotactic probability effects.
Article
A new software paradigm `Software as a Service' based on web services is proposed for multilingual linguistic tools and exemplified with the BAS CLARIN web services. Instead of traditional tool development and distribution the tool functionality is implemented on a highly available server that users or applications access via HTTP requests. As examples we describe in detail five multilingual web services for speech science operational since 2012 and discuss the benefits and drawbacks of the new paradigm as well as our experiences with user acceptance and implementation problems. The services include automatic segmentation of speech, grapheme-to-phoneme conversion, syllabification, speech synthesis, and optimal symbol sequence alignment.
Article
The Czech trill-fricative, /r̝/, is typologically rare among the world's languages. The present study used electromagnetic articulography (EMA) to examine the cross-sectional morphology during the production of the trill-fricative /r̝/ compared to the plain trill /r/ and sibilant fricatives /ʃ, ʒ, s, z/. Data collected from 5 native speakers of Czech show that the coronal shape of the tongue for the trill-fricative is flat, similar to that of the plain apical trill and the post-alveolar fricatives, but different from the highly grooved alveolar fricatives. However, toward the tip of the tongue, the trill-fricative is somewhat more grooved than the posterior region. This may help facilitate frication during trilling. The results also indirectly suggest that lateral tongue bracing is important for the articulation of trills. Furthermore, contrary to some previous descriptions in the literature, /r̝/ is more similar to post-alveolars than alveolars, and exhibits /ʒ/-like articulatory characteristics.
Article
The development of contrastive tone because of the articulatory reinterpretation of segmentally-caused perturbations in intrinsic fundamental frequency is well attested in a number of unrelated languages. Considering the wide-spread character of this process, it is likely that its 'seeds' can be found in the functioning of the human articulatory and/or auditory mechanisms. This paper reviews what the authors consider promising explanations for well-attested tonal sound patterns, e.g. tone originating from the effect of prevocalic stop consonants or postvocalic glottal consonants, and tone rarely or never originating from the influence of postvocalic non-glottal consonants or from vowel height.
Article
This paper examines the acoustic characteristics of voiceless sibilant fricatives in English-and Japanese-speaking adults and the acquisition of contrasts involving these sounds in 2- and 3-year-old children. Both English and Japanese have a two-way contrast between an alveolar fricative (/s/), and a postalveolar fricative (/∫/ in English and /ɕ/ in Japanese). Acoustic analysis of the adult productions revealed cross-linguistic differences in what acoustic parameters were used to differentiate the two fricatives in the two languages and in how well the two fricatives were differentiated by the acoustic parameters that were investigated. For the children's data, the transcription results showed that English-speaking children generally produced the alveolar fricative more accurately than the postalveolar one, whereas the opposite was true for Japanese-speaking children. In addition, acoustic analysis revealed the presence of covert contrast in the productions of some English-speaking and some Japanese-speaking children. The different development patterns are discussed in terms of the differences in the fine phonetic detail of the contrast in the two languages.
Article
This paper presents three-dimensional tongue surfaces reconstructed from multiple coronal cross-sectional slices of the tongue. Surfaces were reconstructed for sustained vocalizations of the American English sounds [symbol: see text]. Electropalatography (EPG) data were also collected for the sounds to compare tongue surface shapes with tongue-palate contact patterns. The study was interested also in whether 3-D surface shapes of the tongue were different for consonants and vowels. Previous research and speculation had found that there were differences in production, acoustics, and linguistic usage between the two groups. The present study found that four classes of tongue shape were adequate to categorize all the sounds measured. These classes were front raising, complete groove, back raising, and two-point displacement. The first and third classes have been documented before in the midsagittal plane [cf. R. Harshman, P. Ladefoged, and L. Goldstein, J. Acoust. Soc. Am. 62, 693-707 (1976)]. The first three classes contained both vowels and consonants, the last only consonants. Electropalatographic patterns of the sounds indicated three categories of tongue-palate contact: bilateral, cross-sectional, and combination of the two. Vowels used only the first pattern, consonants used all three. The EPG data provided an observable distinction in contact pattern between consonants and vowels. The ultrasound tongue surface data did not. The conclusion was that the tongue actually has a limited repertoire of shapes and positions them against the palate in different ways for consonants versus vowels to create narrow channels, divert airflow, and produce sound.
The electromagnetic articulography Mandarin accented English (EMA-MAE) corpus of acoustic and 3D articulatory kinematic data
  • A Ji
  • J J Berry
  • M T Johnson
Ji, A., Berry, J. J., & Johnson, M. T. (2014). The electromagnetic articulography Mandarin accented English (EMA-MAE) corpus of acoustic and 3D articulatory kinematic data. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing -Proceedings, 7719-7723.
On the Input Information of the C/D Model for Vowel Devoicing in Japanese
  • M F Matsui
Matsui, M. F. (2017). On the Input Information of the C/D Model for Vowel Devoicing in Japanese. Journal of the Phonetic Society of Japan, 21(1), 127-140.