About
59
Publications
8,256
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
286
Citations
Introduction
Current institution
Additional affiliations
February 2018 - present
January 2016 - February 2018
February 2008 - September 2015
Education
February 2008 - October 2015
National Tsing Hua University
Field of study
- Speech science, Phonetics, Linguistics
September 2005 - February 2008
National Tsing Hua University
Field of study
- Linguistics
Publications
Publications (59)
Studies have documented a sound change in some English dialects whereby /s/ in the context /strV/ surfaces as [ʃtrV]. This can be interpreted as /s/ rounding for the rhotic being re-analyzed as [ʃ]. We recently measured F;M, the frequency of the main peak in mid-fricative spectra, in seven adults producing words with /s/ and /ʃ/ across phonetic con...
The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). “Comparing measurement errors for formants in synthetic and natural vowels,” J. Acoust. Soc. Am. 139(2), 713–727]. To date, validating its accuracy has depended on formant synthesis for groun...
Fricatives have noise sources that are filtered by the vocal tract and that typically possess energy over a much broader range of frequencies than observed for vowels and sonorant consonants. This paper introduces and refines fricative measurements that were designed to reflect underlying articulatory and aerodynamic conditions These show differenc...
Esophageal (ES) speech, tracheoesophageal (TE) speech, and the electrolarynx (EL) are common methods of communication following the removal of the larynx. Our recent study demonstrated that intelligibility may increase for Cantonese alaryngeal speakers using clear speech (CS) compared to their everyday “habitual speech” (HS), but the reasoning is s...
Since 1995 [Shapiro, M. A case of distant assimilation: /str/ → /ʃ tr/. American Speech, 70, 101–107], many studies have noted a sound change occurring in some dialects of English with /s/ in the context /strV/ surfacing as [ʃ trV]. At first the phenomenon was restricted to certain contexts, though studies disagree on which; increasingly, those who...
Effects of syllable position on the acoustic structure of speech sounds have been explored for different consonants, particularly for the lateral approximant /l/ in American English (AE). Sproat and Fujimura [J. Phonetics 21, 291–311 (1993)] reported that /l/’s spectral properties vary between initial (light) and final (dark) position, as in [l]eas...
Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellit...
Word-level prosody plays an important role in processes of consonant lenition. Typically, consonants in word-initial position are strengthened while those in word-medial position are lenited (Keating, Cho, Fougeron, & Hsu, 2003). In this paper we examine the relationship between word-prosodic position and obstruent lenition in a spontaneous speech...
No PDF available
ABSTRACT
Movements of the head during speech serve multiple communicative purposes, including perceptual enhancement of prosodic F0 contours. However, it remains uncertain how much of any correlation between head movement and F0 may be due to physiological coupling mechanisms exerting effects on glottal tension. In this work, six n...
No PDF available
ABSTRACT
It has long been established that LPC analysis results in formant estimates that are not accurate representations of the resonances; they are biased towards the nearest harmonic, and this bias worsens as F0 rises to 200 Hz or more. Manual measurement of formants with the reassigned spectrogram (RS) has been shown to be mor...
No PDF available
ABSTRACT
The effect of syllable position on the acoustic structure of speech sounds has been studied for different consonants. A well-known example is the lateral approximant /l/ in American English (AE), whose spectral properties vary between initial and final position, as characterized by the light–dark [l] distinction, as in [l]...
Background/aim:
The purpose of this study was to provide preliminary data concerning the effect of clear speech (CS) on Cantonese alaryngeal speakers' intelligibility.
Methods:
Voice recordings of 11 sentences randomly selected from the Cantonese Sentence Intelligibility Test (CSIT) were obtained from 31 alaryngeal speakers (9 electrolarynx [EL]...
Contours traced by trained phoneticians have been considered to be the most accurate way to identify the midsagittal tongue surface from ultrasound video frames. In this study, inter-measurer reliability was evaluated using measures that quantified both how closely human-placed contours approximated each other as well as how consistent measurers we...
The effect of syllable position on the acoustics of speech sounds has been studied for different consonants. A well-known example is the “light”-“dark” /l/ distinction in English (e.g., [l]eaf vs. fee[ɫ]). Compared to light onset [l], dark coda [ɫ] generally displays more acoustic energy in the lower frequencies. /l/ shares articulatory and histori...
No PDF available
ABSTRACT
Alaryngeal speech is an alternative method of verbal communication following the removal of the larynx. Common alaryngeal speaking methods include esophageal (ES) speech, tracheoesophageal (TE) speech, and speech produced using an electrolarynx (EL). A recent study [Hui et al., in press, Folia Phoniatr Logop] demonstrated...
*****************************************************
DOWNLOAD:
https://haskinslabs.org/sites/default/files/files/Reprints/hl2008.pdf
*******************************************************
Vowel-intrinsic fundamental frequency (IF0), the phenomenon that high vowels tend to have a higher fundamental frequency (f0) than low vowels, has been studied...
Ultrasound imaging (USI) of the tongue has been widely used for research purpose as well as in clinical trials (e.g., visual biofeedback). However, USI contouring is a labor-intensive task. Recent improvements in automatic methods of USI contouring have focused on two directions: 1) edge detection/tracking algorithms, such as EdgeTrak (SNAKE-based...
No PDF available
ABSTRACT
This study aims to establish a model for the variability of fricatives in normal speakers. Such a model is useful for studying populations with disordered speech (e.g., laryngectomies, glossectomies, adolescents with residual speech sound errors, etc.). Seven normal adult speakers (4 women, 3 men) were recorded uttering a...
Purpose
The purpose of this study was to report the variability of electrolarynx (EL) users' speech intelligibility in quiet and in multitalker babble.
Method
Ten EL users (five Servox® Digital, five TruTone™) who were at least 2 years postlaryngectomy provided recordings of five sentences from the 1965 Revised List of Phonetically Balanced Senten...
Word-level prosody plays an important role in processes of consonant lenition. Typically, consonants in word-initial position are strengthened while those in word-medial position are lenited (Keating et al., 2003). In this paper we examine the relationship between word-prosodic position and obstruent lenition in a spontaneous speech corpus of Yolox...
Patterns of relative timing between consonants and vowels appear to be conditioned in part by phonological structure, such as syllables, a finding captured naturally by the two-level feedforward model of Articulatory Phonology (AP). In AP, phonological form – gestures and the coordination relations between them – receive an invariant description at...
No PDF available
ABSTRACT
The 720 phonetically balanced IEEE sentences have been recorded from eight speakers using electromagnetic articulometry (EMA) at "normal” and "fast” production rates. Participants self-selected their preferred normal rate and were instructed to produce the fast rate as quickly as possible without making errors (errorful pr...
No PDF available
ABSTRACT
The electrolarynx (EL) is a hand-held electronic device that provides individuals with a means of communicating verbally postlaryngectomy. The EL produces a vibratory sound source that can be transmitted through the neck, where it excites vocal tract resonances generating speech. While some users attain a high level of int...
No PDF available
ABSTRACT
Ultrasound imaging is a non-invasive technique for the measurement of the tongue in speech. Recent advancements in analytical edge detection algorithms and deep learning methods have improved tongue contour segmentation. However, most edge detection algorithms require user input as initialization “seeds” and accuracy can d...
Previous studies found that non-native vowel categorization can be explained by the Perceptual Assimilation Model (PAM), based on perception experiments. However, fewer production studies have been carried out to further support the findings. Here we examined English monophthongs and diphthongs produced by Mandarin learners of English (MAE), compar...
Speech is notoriously variable, but our understanding of this variability continues to evolve. Variability has typically been taken as an indication of failure to reach a desired target due to physical or neurological limits. However, it is likely that some variability is beneficial, an effect that has been found in other domains. Part of the effor...
Variability is widespread in speech, but it is unlikely that all of it is harmful; variability in other domains has been shown to allow flexibility, within limits. Using one technique for separating the two, we applied Uncontrolled Manifold Analysis (UCM) to vowels in running speech. This results in two multidimensional manifolds, one the controlle...
Standard Chinese distinguishes a three-way place distinction among sibilants: (Denti)-Alveolar /s/, 'Retroflex' (Post-Alveolar) /ʂ/, and (Alveolo)-Palatal /ɕ/. While Taiwanese Mandarin generally preserves the standard consonant inventory, previous studies have described its retroflex coronals as being partially merged with alveolars, with higher ac...
The vowel-intrinsic fundamental frequency (IF0) is a universal tendency for high vowels to have higher F0 than low vowels. The "tongue pull" hypothesis is the most successful account of IF0, but other factors seem to play a role as well. Few studies have investigated the articulatory correlates of IF0, and their results are somewhat inconsistent. H...
Many developmental studies attribute reduction of acoustic variability to increasing motor 2 control. However, linear prediction-based formant measurements are known to be biased toward 3 the nearest harmonic of F0, especially at high F0s. Thus, the amount of reported formant 4 variability generated by changes in F0 is unknown. Here, 470,000 vowels...
Previous studies have used speech variability as a measure of speech development; for instance, children reduce the variability in their formant frequencies as they grow, indicating increases in speech motor control. However, formant measurements in most of these studies are computed using variants of linear prediction coding (LPC), which is known...
When using ultrasound imaging of the tongue for speech recording/research, submental transducer stabilization is required to prevent the ultrasound transducer from translating or rotating in relation to the tongue. An iterative prototype of a lightweight three-dimensional-printable wearable ultrasound transducer stabilization system that allows fle...
Glossectomy surgery affects the ability to elevate the tongue tip (Grimm et al, JSLHR v.60, 3417–3425, 2017), which is known to affect production of /s/. Interestingly, /∫/ is less affected by glossectomy surgery. In this study, the acoustics of normal and aberrant /s/ and /∫/ were characterized using three parameters developed for normal adults an...
Motor Equivalence in articulation refers to different articulatory configurations that yield similar, if not identical, acoustic output (e.g., Perkell et al, 1993; Perrier & Fuchs, 2015). Here we report the use of such motor equivalence in English vowels, predicted by deep neural network (DNN) models.
Speech, though communicative, is quite variable both in articulation and acoustics, and it has often been claimed that articulation is more variable. Here we compared variability in articulation and acoustics for 32 speakers in the x-ray microbeam database (XRMB; Westbury, 1994). Variability in tongue, lip and jaw positions for nine English vowels...
In this study, we describe an approach to finding an optimal coordinate origin for rotating/translating tongue surface contours derived from ultrasound images to an external (head-centric) coordinate system. In addition, we report validation tests of the positional accuracies of the tracked contours in ultrasound images while the probe is free to m...
We examined the factors which contributed to extreme variation in the production of obstruents in Yoloxóchitl Mixtec using automated methods on a spontaneous speech corpus. Stress contributed to both patterns of partial voicing and to patterns of lenition in the corpus. A set of deep neural network models were constructed to model qualitative diffe...
Previous studies have claimed that lower formants should be weighted more than higher formants in a perceptual model of vowel perception (e.g., Schwartz et al., 1997). Given this formant weighting hypothesis, and if vowels have acoustic targets, vowels should be more variable in higher formant frequencies. Here, we examined within-speaker variabili...
Purpose
Models of speech production often abstract away from shared physiology in pitch control and lingual articulation, positing independent control of tone and vowel units. We assess the validity of this assumption in Mandarin Chinese by evaluating the stability of lingual articulation for vowels across variation in tone.
Method
Electromagnetic...
Spectral moments have been taken as the primary measurements of fricatives, but resonances are evident as well. To contrast the two, the X-ray Microbeam Database (XRMB) was used to investigate acoustic and articulatory behavior in [s] for 10 /sVd/ words and 9 /sCV*/ words for 24 subjects as in a previous study [Iskarous et al., JASA 129:2, 2011]. S...
This study proposes a method of superimposing a physical palatal profile, extracted from a speaker's maxillary impression, onto real-time mid-sagittal articulatory data. A palatal/dental profile is first obtained by three-dimensional–scanning the maxillary impression of the speaker. Then a high resolution mid-sagittal palatal line, extracted from t...
Regression analysis and mutual information have been used to measure the degree of dependence between a consonant and a vowel, and this has been used to identify the invariance of consonant place and to quantify the coarticulatory resistance of consonants [e.g., Fowler (1994). Percept. Psychophys. 55, 597–610]. This paper presents the first applica...
"MELTER" is an Excel-embedded computer program for converting data table between wide and long format in Excel (from long table to wide table or the reverse).
This can also be done easily with 'melt' function in R, but MELTER comes in handy when you don't want to use R for simple tasks. See the "Instruction" sheet in the file for details.
Coronal consonants hold a special status for their crowded space for articulatory contrasts. Mandarin and its variants are known to have rich inventories of coronal consonants, where a three-way coronal place contrast is usually maintained. Nonetheless, previous studies have reported a (partial) neutralization of coronal places of articulation in T...
Palatal traces reconstructed by current advanced technologies of real-time mid-sagittal articulatory tracking (e.g., EMA, ultrasound, rtMRI, etc.) are mostly in low-resolution and lack concrete anatomical/orthodontic reference points as firm articulatory landmarks for determining places of articulation. The present study proposes a method of superi...
We studied tone-vowel coproduction using Electromagnetic Articulography (EMA). Fleshpoints on the tongue and jaw were tracked while native Chinese speakers (n = 6) produced three vowels, /a/, /i/, /u/, combined with four Chinese tones. We found differences in tongue position across tones for /a/ and for /i/ but not for /u/. The low and rising tones...
Studies in locus equations, a quantification of the degree to which F2 at vowel onset (or consonant place) can be predicted by F2 at vowel midpoint (or vowel place), have shown that the slope of locus equations is a reverse measure of coarticulatory resistance of consonants, in consonant-vowel (CV) sequences with C fixed and V varying. This study p...
Speech errors in Taiwanese are investigated by means of a speeded repetition task. Our results show that the intrusion bias is also attested for word pairs with mismatched onsets, whereas in the alternating coda condition, reduction errors are the most frequent error type. This cross-linguistic difference can be attributed to language-specific impl...