Article

A Cross-Language Study of Voicing in Initial Stops: Acoustical Measurements

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... (1) a. /tb/ → [db]: e.g., hát-ba "back-ill"; két#barát "two friends" b. /zt/ → [st]: e.g., víz-től "water-abl"; víz#torony "water tower" c. /ɡdh/ → [kth]: e.g., smaragd-hoz "emerald-allat" d. méz [z] "honey"~ mész [s] "limestone" English, just like Hungarian, displays a symmetrical laryngeal obstruent system, but unlike Hungarian and Spanish, English is an aspirating language (Lisker & Abramson, 1964), that is, the contrast of stops is based on aspiration rather than voicing. "Voiced" stops, or as they are generally referred to in the phonological literature, lenis stops (in initial position) are typically produced with zero or short-lag VOT, though negative VOT is also attested (e.g., Flege, 1982); thus, phonetically they are typically voiceless and unaspirated, while voiceless, or fortis, stops are produced prevocalically with a relatively longlag VOT (i.e., aspirated). ...
... We summarize the relevant features of the laryngeal system of the three languages in Table 1. (2) a. /sd/ → [zð] desde "from"; /sb/ → [zβ] coches#baratos "cheap cars" b. /sl/ → [zl] isla "island"; /sm/ → [zm] las#minas "the mines" dh/ → [kth]: e.g., smaragd-hoz "emerald-allat" d. méz [z] "honey"~mész [s] "limestone" English, just like Hungarian, displays a symmetrical laryngeal obstruent system, but unlike Hungarian and Spanish, English is an aspirating language (Lisker & Abramson, 1964), that is, the contrast of stops is based on aspiration rather than voicing. "Voiced" stops, or as they are generally referred to in the phonological literature, lenis stops (in initial position) are typically produced with zero or short-lag VOT, though negative VOT is also attested (e.g., Flege, 1982); thus, phonetically they are typically voiceless and unaspirated, while voiceless, or fortis, stops are produced prevocalically with a relatively long-lag VOT (i.e., aspirated). ...
... The presence or absence of aspiration was detected by measuring the Voice Onset Time. Since Lisker and Abramson (1964), VOT has been used as one of the most established measurements of the laryngeal differences between (non-final) stop consonants, and it is defined as the timing relation between the moment of the release of the stop and the onset of glottal pulsing of the next vowel or sonorant consonant (Abramson & Whalen, 2017 provides a good overview of the theoretical and practical issues concerning VOT). This definition has become such a commonplace in the phonetic literature that many authors do not specifically indicate how exactly they measured VOT. ...
Article
Full-text available
The present paper investigates the link between perception and production in the laryngeal phonology of multilingual speakers, focusing on non-contrastive segments and the dynamic aspect of these processes. Fourteen L1 Hungarian, L2 English, and L3 Spanish advanced learners took part in the experiments. The production experiments examined the aspiration of voiceless stops in word-initial position, regressive voicing assimilation, and pre-sonorant voicing; the latter two processes were analyzed both word-internally and across word boundaries. The perception experiments aimed to find out whether learners notice the phonetic outputs of these processes and regard them as linguistically relevant. Our results showed that perception and production are not aligned. Accurate production is dependent on accurate perception, but accurate perception is not necessarily transferred into production. In laryngeal postlexical processes, the native language seems to play the primary role even for highly competent learners, but markedness might be relevant too. The novel findings of this study are that phonetic category formation seems to be easier than the acquisition of dynamic allophonic alternations and that metaphonological awareness is correlated with perception but not with production.
... To investigate the intricacies of plosive articulation within a language or cross-linguistically, the acoustic analysis of Voice Onset Time or VOT has been widely used (Lisker & Abramson, 1964). VOT, defined as the time interval between the release of a stop consonant and the beginning of vocal cord vibration which initiates voicing, is a crucial acoustic cue that differentiates plosive sounds in speech visualized by their waveforms (Ladefoged & Johnson, 2015). ...
... VOT, defined as the time interval between the release of a stop consonant and the beginning of vocal cord vibration which initiates voicing, is a crucial acoustic cue that differentiates plosive sounds in speech visualized by their waveforms (Ladefoged & Johnson, 2015). Since Thai and English have different numbers of plosive consonants and categories, the VOT of their plosives also vary and, therefore, became the subject of interest in cross-language research (e.g., Donald, 1976;Kessinger & Blumstein, 1997;Lisker & Abramson 1964). The different VOT categories and ranges may affect the pronunciation of English plosives spoken by Thais. ...
... 724-747 In Thai, there are eight plosive consonants, namely /b, d, p, t, k, pʰ, tʰ, kʰ/ from three places of articulation: bilabial, alveolar, and velar. They are divided into three categories: voiced, voiceless unaspirated, and voiceless aspirated (Lisker & Abramson, 1964) as illustrated in Table 1. Aspiration plays an important role in distinguishing phonemes in the voiceless category, for example, /ta/ <ตา> means 'eyes' while /tʰa/ <ทา> means 'paint or apply liquid on a surface'. ...
Article
Full-text available
This study aims to investigate the VOT values of English word-initial plosive consonants produced by young Thai learners to understand current trends in English pronunciation among Thai speakers and its future direction. The study analyzes how phonological mismatches between Thai and English affect the pronunciation of Thai learners, using a speech corpus produced by 49 seventh-grade students. The results reveal recurring patterns of consonant pronunciation, classified into Consistent and Inconsistent groups. While voiced plosives /b/ and /d/ were mostly pronounced with voicing lead, resembling Thai phonetic norms, velar /g/ was frequently substituted with /k/ due to the absence of /g/ in the Thai sound system. However, some participants demonstrated a shift toward more native-like English pronunciation. Voiceless plosives were generally produced with long-lag VOT, aligning with both Thai and English norms, although some inconsistencies in aspiration were noted. The findings highlight the dynamic shift of English pronunciation among young Thai speakers. This research contributes to a deeper understanding of English spoken by Thais and facilitates instructors in designing targeted pronunciation tasks.
... One of these is the timing of voicing in stops. Lisker and Abramson (1964) identified different VOT durations in English stop consonants. They stated that the English voiceless stops /p/, /t/, and /k/ have longer VOT values. ...
... Docherty (1992) carried out an experiment on the timing of voicing in British obstruents. His findings confirmed the previous findings by Lisker and Abramson (1964) that the short-lag VOT in /b/ was significantly shorter than that for /ɡ/. Docherty (1992) concluded that longer VOT durations are related to velar stops. ...
... In this study, the data was recorded and analysed using Praat software. For voiceless stops, the duration of VOT was measured from the offset of the hold phase (the release burst) to the onset of vocal folds vibration of the following vowel (Lisker & Abramson, 1964;Jannedy, 1995). In voiced stops, VOT was measured from the onset of vocal fold vibration during the hold phase to the release burst. ...
Article
Full-text available
This study aims to investigate the duration of voice onset time of single stop consonants in Tripolitanian Libyan Arabic. It also seeks to identify any potential influence of the place of articulation of these stops and the vocalic context on this duration. Four Tripolitanian Libyan Arabic speakers were recorded while producing 39 monosyllabic words with /b/, /t/, /d/, /k/ and /ɡ/ followed by the vowels /iː/,/i/,/aː/,/a/,/uː/,/u/, /eː/ and /oː/. The duration of positive voice onset time was measured from the release burst to the onset of vocal fold vibration. For negative voice onset time, the duration was measured from the initiation of voicing during the hold phase to the release burst. Results of the analysis show that voice onset time in Tripolitanian Libyan Arabic Falls under two categories. While voiceless stops have a positive voice onset time value ranging from 14 ms to 44 ms, voiced stops have a negative voice onset time ranging from –33 to –60. Results have also revealed that voice onset time varies as a function of the place of articulation of the stop and the quality and duration of the following vowel. As the stop's place of articulation moved from an anterior to a posterior point in the vocal tract, the duration of voice onset time seemed to increase. The duration of voice onset time is longer when voiceless stops are followed by a close vowel, compared to when the stop was followed by a non-high vowel. Finally, voice onset time was longer when voiceless stops were followed by the long vowels /i:/ and /u:/. This tendency was not observed when the stops were followed by /a:/.
... The quality and quantity of L2 input, as well as language dominance, have thus become a more important factor when considering the phonetic variation of bilinguals (Flege and Liu 2001;Flege, Mackay and Piske 2002;Flege 2007Flege , 2018Flege and Wayland 2019;VanPatten, Smith and Benati 2020;Yule 2020). In many studies that measure the timing of voicing of stop consonants, voice onset time (VOT) has been widely adopted as a parameter since Lisker and Abramson (1964) first proposed it as an acoustic measure in the production of stop consonants across languages. Over 50 years later, VOT is still proven to be a robust measure for voicing distinction in most languages (Abramson and Whalen 2017). ...
... Traditionally, the Tamil orthography uses a single symbol at each place of articulation of obstruents and a geminate is used to indicate a voiceless stop consonants (Keane, 2004;Wiltshire and Harnsberger 2006;Bhaskararao 2011;Kanapathy 2015). Due to the extensive loanwords, voicing is treated as partially allophonic in Tamil among Dravidian languages and the use of voiced stop consonants is seen in previous studies (Lisker and Abramson 1964;Kanapathy 2015;Wiltshire 2015). As verified by the Malaysian Tamil native speakers and linguistic experts, the voicing distinction in word-initial stop consonants is present due to the large amount of non-native vocabulary. ...
... Even though English voiced stop consonants have no counterpart in the L1 phonemic inventory, they are often phonetically implemented as unaspirated stop consonants that are not new to the Mandarin-English bilinguals. Besides, the occurrence of voicing lead may be induced by the phonological features of Malay language in the linguistic repertoire of Mandarin-English bilinguals since English voiced stop consonants are rarely phonetically realised as voiced stop consonants (Lisker and Abramson 1964;Docherty 1992;Cho and Ladefoged 1999;Cho, Whalen and Docherty 2019). On the other hand, English voiced stop consonants produced by Malay-English and Tamil-English bilinguals are usually phonetically realised as voiced stops consonants instead of unaspirated stop consonants. ...
Article
In multiracial and multilingual Malaysia, Malaysian English (MalE) is not a homogeneous variety. Thus, the present study examines the phonetic implementation of voicing contrast in MalE across three major ethnic groups, Malay, Chinese and Indian, and compares the results with their first languages (L1s) and British English due to the historical ties. Voice onset time (VOT) and closure duration are measured and analysed in within-group and between-group comparisons. Findings reveal evident L1 influence on the initial stop production of Malaysian bilingual speakers, and simultaneous influence of British English due to long-term language contact. The influence of Malay as the national language is also observed. While VOT appears to play a role in discriminating between voiced and voiceless initial stop consonants, closure duration does not reflect equivalent significant effects. Hence, the phonetic and phonological features of MalE in multilingual Malaysia offer insights into one of the Englishes spoken beyond the Inner Circle.
... One acoustic feature often used to investigate phoneme identification is the Voice Onset Time (VOT) of stop consonants -the time that elapses between the release of a closure in the vocal tract and the onset of the vocal fold vibration (Lasky et al., 1975;Lisker & Abramson, 1964;Streeter, 1976). Across the world's languages, VOT is a key feature used to distinguish between different categories of consonants. ...
... Across the world's languages, VOT is a key feature used to distinguish between different categories of consonants. Some languages, such as English and Mandarin Chinese, have a two-way VOT distinction, categorizing consonants as either "voiced" (with a short VOT) or "voiceless" (with a long VOT) (Lisker & Abramson, 1964;Rochet & Fei, 1991). Other languages, like Thai and Korean, have a threeway VOT contrast, which includes "voiced" (short VOT), "voiceless unaspirated" (medium VOT), and "voiceless aspirated" (long VOT) categories (Lisker & Abramson, 1964;Cho, Jun & Ladefoged, 2002). ...
... Some languages, such as English and Mandarin Chinese, have a two-way VOT distinction, categorizing consonants as either "voiced" (with a short VOT) or "voiceless" (with a long VOT) (Lisker & Abramson, 1964;Rochet & Fei, 1991). Other languages, like Thai and Korean, have a threeway VOT contrast, which includes "voiced" (short VOT), "voiceless unaspirated" (medium VOT), and "voiceless aspirated" (long VOT) categories (Lisker & Abramson, 1964;Cho, Jun & Ladefoged, 2002). Previous acoustic measurements have shown that Singapore English (Huang, 2003) and Singapore Mandarin (Ng, 2005) 1 differ in their VOTs, with typical voiced stops (/b/,/d/,/ɡ/) and voiceless stops (/p/,/t/,/k/) occurring rather close together for Singapore English (4 ms and 25 ms, respectively, with notable pre-voicing in certain contexts), and rather far apart for Singapore Mandarin (9 ms and 90 ms, respectively). ...
Article
Full-text available
Phoneme identification is critically involved in alphabetic reading, and weak skills in phoneme perception are known to correspond to the development of reading skills in children around the age of reading instruction. However, current models of phoneme perception have been established in monolingual populations, and these relationships may not hold for bilingual children. We developed a phoneme identification task, the CROWN Game, as part of a comprehensive assessment package for identifying potential reading difficulties. In the task, children perform a phoneme identification task, using familiar words with onset consonants spanning a voice-onset time continuum from/b/to/p/(−60 ms to 90 ms). 138 English-Chinese bilinguals in Singapore kindergartens were tested on phoneme perception for English words “beach” and “peach.” We fitted individual responses to a psychometric function and measured individual VOT threshold and the slope of the transition between categories. The task showed a wide spread of scores and good split-half reliability, suggesting that the CROWN game is suitable for identifying stable individual differences between children. In a preregistered subgroup analysis, we examined whether a child’s language background (input ratios and age of acquisition) influenced phoneme perception. GLMMs showed no influence of language exposure on phoneme perception in this task. This suggests that the CROWN Game, as part of a broader assessment package, is valid for use with bilingual children with different rates of exposure to their languages in evaluating phoneme perception skills relevant to reading development.
... One of the interesting topics in phonological cross-linguistic studies is laryngeal contrast between homorganic stops. This contrast is phonetically manifested by several acoustic correlates, the most important of which is voice onset time, or VOT (Lisker & Abramson, 1964). In true voiced stops, onset of glottal pulsing before the release of a stop results in negative VOT (prevoicing, or voice lead). ...
... Laryngeal Realism posits that type of a laryngeal contrast and the phonological features that encode the contrast are determined by the prevailing VOT categories in a language and speakers' behavior to control most important acoustic correlates of these features. We found three categories of VOT in voiced and voiceless stops: short lag, long lag, and voice lead (Lisker & Abramson, 1964). ...
... The results support predictions of LR and reveal four major types of laryngeal contrast across Western Iranian languages spoken in Iran (Table 17). The typology of the laryngeal contrasts in Western Iranian languages is consistent with the cross-linguistic tendencies (e.g., Lisker & Abramson, 1964). The realization of a three-way contrast in Kurmānji Kurdish is similar to what was reported for other languages with this laryngeal contrast, e.g., Thai (Kessinger & Blumstein, 1995). ...
Preprint
Full-text available
Western Iranian languages, spoken by ethnic minorities in Iran, are facing the risk of extinction by losing their status against the official language. As a result, they are understudied or very poorly documented. Although the typology of Iranian languages has been studied extensively (e.g., Dabir-Moghaddam, 2014), less attention has been paid to their phonetic and phonological features. The present study investigates typology of voicing contrasts in stop consonants within the laryngeal realism framework in twelve Western Iranian languages of Iran selected from the Balochi, Caspian, Gorani, Kurdic, Larestani, Luri, Semnani, and Tatic subgroups. We tested the realization of voicing by measuring VOT and F1 values word-initially, and the degree of closure voicing at word-medial position. Our findings reveal the existence of three VOT categories (voice lead, short-lag, and long lag) and four types of phonological laryngeal contrasts: [voice]-Ø-[s.g.], [voice]-Ø, Ø-[s.g.], [voice]-[s.g.]. Realization of the uvular obstruent, which does not have a contrastive laryngeal cognate, also revealed variation: it was produced as voiced stop or fricative, voiceless unaspirated stop, or voiceless aspirated stop across languages. The results suggest that VOT is subject to geographical variation and can act as a linguistic variable in linguistic communities.
... Voice Onset Time (VOT) is considered to be the most salient cue differentiating the language-specific realizations of plosives and it refers to the interval between the release of the stop and the onset of voicing (Lisker & Abramson, 1964). There exist three different types of VOT: 'voicing lead' or 'prevoicing' (voicing starts before the closure release of the stop consonant), 'short voicing lag' (voicing begins with the release or shortly after it, 0-30ms), and 'long voicing lag' (voicing starts after the release: >30ms). ...
... Many of the world's languages distinguish two categories of stops, voiced and voiceless which, depending on the language, are associated with different types of VOT. Even though English, Spanish and French are all languages with two categories, namely a two-way stop contrast, they implement all three voice onset timing explained above: 1) voiced and unaspirated: voicing begins before release, i. e. voice lead VOT; 2) voiceless and unaspirated: voicing starts just after the release, i. e. short lag VOT; 3) voiceless and aspirated: voicing lags behind the release, i. e. long lag VOT (Lisker and Abramson, 1964). ...
Article
Full-text available
The study examines the perception of labial stops by English(L1)/Spanish(L2) [Group A] and Spanish(L1)/English(L2) [Group B] learners of French (L3). We investigate Cross-Linguistic Influence (CLI) processes of L3 speech perception by looking at how the previously acquired languages shape L3 perception (progressive CLI) and how an L3 affects the categorization of L2 and L1 sounds (regressive CLI). The possibility that L3 speakers have a single perception system for all languages was also examined. Participants had to identify stimuli from a VOT continuum as either /p/ or /b/, in different languages. Evidence of hybrid L1/L2→L3 progressive cross-linguistic influence was found for Group A and only L1→L3 for Group B. No patterns of regressive CLI were observed. Finally, it is not always the case that trilinguals make use of different perception systems when listening to L1, L2 and L3.
... It has been demonstrated that phonetic features that are perceived as being salient are more likely to appear in the listener's own productions compared to less salient features (Labov et al., 2013). As VOT is considered to be a salient feature (Kohler, 1984;Lisker and Abramson, 1964), listeners are generally able to converge to it (Nielsen, 2011;Paquette-Smith et al., 2022;Schertz and Paquette-Smith, 2023;Shockley et al., 2004;Wade et al., 2021). Consequently, in situations in which different varieties come into contact, variants that are salient in perception (i.e., VOT is more salient than closure duration) and those that listeners are able to imitate are more likely to be adapted and, ultimately, may lead to a sound change (Beddor, 2012;Hadodo, 2019;Harrington et al., 2008;Lindblom et al., 1995;Ohala et al., 1981). ...
... In German-speaking Switzerland, in particular, several lexical items are borrowings from either GSG, English, Italian, or French, although it is not always clear when and from which language they were borrowed and why some of them might be aspirated and others not. Although the precise origin of some borrowings in Swiss German may not always be easy to determine, this proves relevant to the sound change as GSG and English are aspirating languages and Italian and French are not (Cho and Ladefoged, 1999;Jessen, 1999;Lisker and Abramson, 1964). A similar issue arises when it comes to word age (Walker and Hay, 2011) and different usage patterns that depend on the age of the speakers. ...
Article
Full-text available
Recent evidence suggests an ongoing sound change in Zurich German, where the primary cue between lenis and fortis plosives is commonly considered to be closure duration, while both plosive types are traditionally unaspirated and phonetically voiceless. There has been a shift toward more lexical items being aspirated by younger speakers, who also are shown to produce generally longer voice onset times (VOTs) in comparison to older speakers. The current study investigates word-medial and word-initial plosives in speech perception and production. Using the apparent-time paradigm, two experiments were conducted with 48 speakers of Zurich German belonging to 2 age groups. Results confirm that younger speakers produce more aspiration in word-initial fortis plosives than older speakers but disconfirm previous findings which found a reduction in closure duration of fortis plosives. Results from the perception experiment reveal that, word-initially, VOT seems to increase in importance and closure duration is not always sufficient in distinguishing between lenis and fortis plosives. Results further highlight the importance of lexical differences, according to which production and perception are either aligned or misaligned. Overall, the current study provides evidence for a sound change affecting word-initial fortis plosives in Zurich German in speech perception and production.
... In addition, VOT can be different for different plosives, and also for the same plosive in different languages. For example, Lisker & Abramson (1964) report the following VOT values for consonants in Hindi and Marathi (multiple productions by a single speaker of each language). ...
... 4 Voice Onset Time (in msec) in Hindi and Marathi(Lisker & Abramson, 1964) Note: the values represent multiple productions for a single speaker of each language.10.2. Intrusive consonantsConsider the etymology of some English surnames: Johnson is John's son and Robertson is Robert's son. ...
Book
Full-text available
This book aims to provide a comprehensive introduction to phonetics and phonology with emphasis on the relation between the fields. In addition to the more “traditional” topics, such as articulation and phonological analysis, the book discusses topics such as emotive speech and sound symbolism, which rarely appear in general phonetics and phonology textbooks. Another advantage of this book is the use of many media files, which make the contents of the book more vivid.
... When an L2 phoneme is similar but not identical to an L1 phoneme, learners will face the largest challenge in transfer (Flege, 2003), resulting in the assimilation of the L2 sound to a similar L1 candidate (Antoniou et al., 2010;Caramazza et al., 1973;Flege, 1991). The production of stops serves as an exemplar, of which one important measure is the voice onset time (VOT) that denotes the time interval between the release of constriction and the onset of glottal pulsing in articulation (Lisker & Abramson, 1964;Syed & Bibi, 2024). If the pulsing of glottis commences during the full closure before the stop release (prevoicing), such stops are voiced with a negative VOT; if the pulsing commences after the release, such stops are voiceless with a positive VOT. ...
... The target L2 sounds were Japanese stops given that Japanese voiced stops are prevoiced in many cases, which overlaps with Shanghainese phonology but not with Mandarin. In addition, although our participants had learned English before learning Japanese, English stops essentially contrast in aspiration rather than voicing (Lisker & Abramson, 1964;Nagle, 2018). Therefore, Shanghainese should be the only source for bilingual learners to make transfers in learning Japanese voiced stops. ...
Article
Full-text available
The L1-transfer pattern may be different between bilinguals and monolinguals as the former has multiple L1 candidates to transfer. This study compared how Mandarin monolingual learners (MDN), Shanghainese-Mandarin bilingual learners (SHM), and Japanese natives produce Japanese stops in word-reading and paragraph-reading tasks. The L2 Japanese learners varied in the years of learning (1-3 years). Shanghainese differs from Mandarin in that the word-medial voiced stops are prevoiced, which may allow facilitative transfer to Japanese voiced stops. As a result, SHM in general showed more target-like pronunciation of voiced stops than MDN. Regarding the L2 experience, third-year SHM produced more target-like word-medial voiced stops, whereas first-year SHM produced less target-like word-initial voiceless and word-medial voiced stops. These results suggest that the overlap between the target L2 and one of the learners' L1s may lead to finer phonetic realization, but the facilitative transfer is subject to bilingual learners' L2 experience.
... Thus, here the importance of VOT cannot be neglected. Lisker and Abramson (1964) claims that VOT is an important cue in analyzing plosive sounds. The duration of VOT changes with the change in the place of articulation in different sounds. ...
... The second one is the VOT is longer if the contact area is extended (Stevens, Keyser & Kawasaki, 1986).The third finding is the VOT becomes shorter with the movement of the articulators, with faster movement the VOT is shorter (Hardcastle, 1973).These patterns remained valid for many years. Lisker and Abramson (1964) further states that VOT is longer in velar plosives. Furthermore, they claim that VOT is shorter for both the aspirated and unaspirated bilabial plosives and intermediate for alveolar plosives. ...
Article
Full-text available
The main aim of this study is to analyze the Voicing Onset Time (VOT) in Pashto and English Plosive sounds through Praat Analysis. Additionally, the researchers intend to highlight the issues in the production of Plosive sounds by EFL Pashto learners due to the differences between the Voicing Onset Time in Pashto and English Plosives. The researchers have collected the data from L1 Pashto Speakers, EFL Pashto Learners and L1 English speakers through voice recordings and then these recordings have been analyzed through Praat Software. The study shows the clear difference in the VOT of the plosive sounds in both the languages. The VOT duration of English plosives are lengthier than the VOT durations of Pashto plosive sounds and thus Pashto EFL learners are unable to produce these sounds like English RP speakers. The study is beneficial for EFL learners of Pashto language and teachers of English as second or foreign language in the field of English Language Teaching. The EFL students and teachers would between realize the differences in the production of L2 English speech sounds and thus would enhance their pronunciation. The Study recommends that further studies should be conducted taking all speech sounds of Pashto and English language in terms of acoustic properties.
... The perception of sensory events is influenced by spatial and temporal context (Fraisse, 1984;Hirsh, 1959;Lisker & Abramson, 1964;Stilp, 2020;Von & Pöppel, 1991). Forward masking is a phenomenon in which a preceding sound, the masker, causes a temporary reduction in the detectability of a subsequent target sound (Elliott, 1969;Plomp, 1964). ...
Preprint
Full-text available
In forward masking the detection threshold for a target sound (probe) is elevated due to the presence of a preceding sound (masker). Although many factors are known to influence the probe response following a masker, the current work focused on the temporal separation (delay) between the masker and probe and the inter-trial interval (ITI). Human probe thresholds recover from forward masking within 150 to 300 ms, similar to neural threshold recovery in the IC within 300 ms after tone maskers. Our study focused on recovery of discharge rate of IC neurons in response to probe tones after narrowband gaussian noise (GN) forward maskers, with varying time delays. Additionally, we examined how prior masker trials influenced IC rates by varying ITI. Our findings showed that previous masker trials impacted probe-evoked discharge rates, with full recovery requiring ITIs over 1.5 s after 70 dB SPL narrowband GN maskers. Neural thresholds in the IC for probes preceded by noise maskers were in the range observed in psychoacoustical studies. Two proposed mechanisms for forward masking, persistence and efferent gain control, were tested using rate analyses or computational modeling. A physiological model with efferent feedback gain control had responses consistent with trends in the physiological recordings.
... In this scenario, L2 sounds are consistently mapped onto existing L1 categories, hindering the formation of new, distinct L2 categories. For example, native speakers of French or Spanish may produce the English /t/ with an intermediate voice onset time (VOT) (Lisker & Abramson, 1964;Abramson & Lisker, 1967), value blending characteristics of both languages (Flege et al., 2003). Category dissimilation happens when new L2 categories are established and diverge from L1 categories in common phonetic space to maintain perceptual contrast (Lindblom, 1990). ...
Article
Full-text available
This study investigated the neural mechanisms underlying bilingual speech perception of competing phonological representations. A total of 57 participants were recruited, consisting of 30 English monolinguals and 27 Spanish-English bilinguals. Participants passively listened to stop consonants while watching movies in English and Spanish. Event-Related Potentials and sLORETA were used to measure and localize brain activity. Comparisons within bilinguals across language contexts examined whether language control mechanisms were activated, while comparisons between groups assessed differences in brain activation. The results showed that bilinguals exhibited stronger activation in the left frontal areas during the English context, indicating greater engagement of executive control mechanisms. Distinct activation patterns were found between bilinguals and monolinguals, suggesting that the Executive Control Network provides the flexibility to manage overlapping phonological representations. These findings offer insights into the cognitive and neural basis of bilingual language control and expand current models of second language acquisition.
... Hearing devices limit the transmission of spectral acoustic information in the speech signal due to, in the case of HAs, the implementation of signal-processing algorithms such as frequency compression and, in the case of CIs, the segregation of the speech signal into discrete frequency bands (Peng et al., 2019;van Tasell, 1993;Xu et al., 2005). PoA contrasts, which are cued by spectro-temporal information (i.e., formant transitions), may therefore be less reliably transmitted by HAs and CIs than voicing contrasts, where the primary cues are temporal (i.e., Voice Onset Time for onsets and closure duration and vowel length for codas) (Lisker & Abramson, 1964;Song et al., 2012). Nevertheless, both contrast types appear challenging to perceive for children with HAs and CIs (Johnson et al., 1984;Mildner et al., 2009;Peng et al., 2019). ...
Article
Full-text available
This study investigates how phonological competition affects real-time spoken word recognition in deaf and hard of hearing (DHH) preschoolers compared to peers with hearing in the normal range (NH). Three-to-six-year olds (27 with NH, 18 DHH, including uni- and bilateral hearing losses) were instructed to look at pictures that corresponded to words alongside a phonological competitor (e.g., / b in- p in /) vs. an unrelated distractor (e.g., / t oy- b ed /). Phonological competitors contrasted in either voicing or place of articulation (PoA), in the onset or coda of the word. Relative to peers with NH, DHH preschoolers showed reduced looks to target in reaction to the spoken words specifically when competition was present. DHH preschoolers may thus, as a group, experience increased phonological competition during word recognition. There was no evidence that phonological properties (voicing vs. PoA, or onset vs. coda) differentially impacted word recognition.
... Examples include fundamental frequency, voice onset time (VOT), amplitude, and harmonic structure. For instance, the perceptual distinction between English plosives /p/ and /b/ is influenced not only by VOT but also by factors such as periodic pulsing at the voice pitch frequency and noise in the frequency range of higher formants (Lisker & Abramson, 1964). Learners whose L1 does not utilize these additional cues might find it challenging to accurately produce and perceive these plosive contrasts, as it is the case of Korean learners of English (Kong;Yoon, 2013). ...
Article
Full-text available
Este estudo investiga como aprendizes brasileiros de inglês como segunda língua produzem o contraste de duração entre vogais longas ([i, u]) e vogais curtas frouxas ([ɪ, ʊ]), com foco em estratégias de ponderação de pistas. Dado que o Português Brasileiro não apresenta contraste fonêmico na duração das vogais, uma questão que se coloca é como os aprendizes adquirem essa propriedade em inglês. O estudo avalia a influência de pistas acústicas como duração da vogal, F1 e F2 na produção dos alunos em diferentes níveis de proficiência. Usando pontuações de Pillai para avaliar a separação de categorias e um modelo de Máxima Entropia (MaxEnt) para estimar os pesos das pistas, a análise revela como essas pistas são integradas nos sistemas fonológicos dos alunos. Os resultados mostram que os alunos com menor proficiência dependem fortemente da duração das vogais, enquanto os alunos com maior proficiência incorporam pistas espectrais (F1, F2) de forma mais consistente, especialmente para contrastes de vogais anteriores ([i, ɪ]). Para vogais posteriores ([u, ʊ]), no entanto, mesmo alunos avançados mostram integração limitada de pistas, conforme indicado pela sobreposição significativa em seu espaço acústico. As pontuações de Pillai demonstram maior separação de categorias para alunos avançados, especialmente nas vogais anteriores, mas permanecem inconsistências nas distinções de vogais posteriores. A análise MaxEnt destaca que a duração recebe pesos mais elevados para contrastes de vogais posteriores, enquanto F1 e F2 desempenham papéis mais significativos para contrastes de vogais anteriores em níveis de proficiência mais elevados. Estas descobertas sugerem que, embora os alunos ajustem progressivamente as suas estratégias de ponderação de pistas à medida que os níveis de proficiência aumentam, os efeitos de transferência de L1 permanecem proeminentes, particularmente na dependência da duração da vogal, contribuindo assim para a nossa compreensão de como os contrastes fonológicos de L2 são desenvolvidos.Palavras-chave: Ponderação de pistas. Contraste de duração. Modelo de Máxima Entropia. Aquisição de Fonologia de L2. Vogal.
... Finally, the STCF has been extended to speech perception. Speech cues often arrive asynchronously; for instance, a /b/ is cued by Voice Onset Time (VOT) at word onset, as well as the pitch and duration of the following vowel [52,53]. Recent studies using the VWP suggest that individual acoustic cues are integrated with lexical representations immediately and Box 1. ...
Preprint
Speech processing requires listeners to map temporally unfolding input to words. There has been consensus around the principles governing this process: lexical items are activated immediately and incrementally as speech arrives, perceptual and lexical representations rapidly decay to make room for new information, and lexical entries are temporally structured. In this framework, speech processing is tightly coupled to the temporally unfolding input. However, recent work challenges this: low-level auditory and higher-level lexical representations do not decay but are retained over long durations; speech perception may require encapsulated memory buffers; lexical representations are not strictly temporally structured; and listeners can delay lexical access substantially in some circumstances. These findings argue for a deep revision to models of word recognition.
... Three types of stop voicing categories are traditionally distinguished in terms of VOT: (1) pre-voiced stops (or lead voicing), in which voicing occurs before the release of the stop (negative VOT values); (2) short-lag unaspirated stops, in which voicing is simultaneous with the release or occurs shortly thereafter (approximately 1-30 milliseconds) and (3) long-lag aspirated stops, in which voicing occurs with a significant time lag after the release (approximately 30 to 80 milliseconds) (Lisker & Abramson, 1964). Acoustic analyses of stop productions in Spanish and English indicate that Spanish voiced stop consonants are usually prevoiced; that is, they are characterized by the onset of voicing prior to the release of the stop, whereas the onset of voicing for English voiced stop consonants typically begins shortly after the release of the stop burst, in the short-lag range. ...
Article
Full-text available
This study examined English VOT productions by 37 Spanish-English bilingual children and 37 matched functional monolinguals, all aged 3-6 years, from the same Latinx community. It also assessed the bilinguals' Spanish stop productions and investigated the effects of age and language exposure on their VOT productions. The results revealed credible between-group differences on English voiced, but not voiceless, stops, with shorter VOTs for bilinguals. However, both groups exhibited similar pre-voicing levels, which may suggest an effect of the community language, Spanish, not only on the bilinguals' English VOT patterns but also the monolinguals'. The study also found cross-linguistic differentiation of voiceless stops, but not voiced ones, in the bilinguals' productions and revealed effects of age and exposure not only on VOT in Spanish but also in the majority language, English. These findings have important implications for the conceptualization of monolingual-bilingual comparisons in settings where the community and majority language coexist. Highlights • Functional monolingual and bilingual children produce English voiceless stops alike • Bilinguals produce English voiced stops with lower VOT values than monolinguals • Both monolinguals and bilinguals considerably pre-voice English voiced stops • Age increases bilinguals' likelihood of pre-voicing in both Spanish and English • Greater exposure to English results in less pre-voicing across both languages
... release of these sounds, is found in many languages such as English, Mandarin, and the Indic languages (3,4). In English, aspiration is considered a phonetic feature of voiceless stops. ...
Article
Full-text available
Introduction This study investigates Mandarin-speaking children's acquisition of aspirated/unaspirated voiceless consonants in terms of perception and production, to track children's developmental profile and explore the factors that may affect their acquisition, as well as the possible association between perception and production. Methods Mandarin-speaking children (N = 95) aged 3–5 and adults (N = 20) participated in (1) a perception test designed based on the minimal pairs of unaspirated/aspirated consonants in the quiet and noisy conditions respectively; (2) a production test where participants produced the target words, with syllable-initial consonants focusing on aspiration and non-aspiration. Six pairs of unaspirated/aspirated consonants in Mandarin were included. Results (1) Children's perception and production accuracy of aspirated and unaspirated consonants increased with age. Five-year-olds achieved high accuracy in the perception under the quiet condition and in the production (over 90%), though not yet adult-like. (2) Noise adversely affected children's perception, with all child groups showing poor performance in the noisy condition. In terms of perception, stops were more challenging to children than affricates, but in terms of production, children performed better on stops. Furthermore, the presence of noise had a greater detrimental effect on the perception of aspirated consonants compared to unaspirated ones. (3) A weak positive correlation was found between children's perception of consonant aspiration in the quiet condition and their production. Discussion The findings indicate that age, aspiration state, and manner of articulation (MOA) would affect children's acquisition of consonant aspiration. Although 5-year-olds have almost acquired aspirated/unaspirated consonants, compared to adults, the perception of consonant aspiration in noise remains a challenge for children.
... Phonemic differences refer to the varying phoneme inventories across languages, where different values are used to distinguish phonetic or phonemic contrasts. For instance, different voice onset time (VOT) values are employed to differentiate categories in each language (Lisker and Abramson, 1964), with ...
Preprint
Full-text available
Dysarthria, a motor speech disorder, severely impacts voice quality, pronunciation, and prosody, leading to diminished speech intelligibility and reduced quality of life. Accurate assessment is crucial for effective treatment, but traditional perceptual assessments are limited by their subjectivity and resource intensity. To mitigate the limitations, automatic dysarthric speech assessment methods have been proposed to support clinicians on their decision-making. While these methods have shown promising results, most research has focused on monolingual environments. However, multilingual approaches are necessary to address the global burden of dysarthria and ensure equitable access to accurate diagnosis. This thesis proposes a novel multilingual dysarthria severity classification method, by analyzing three languages: English, Korean, and Tamil.
... They can also be used as tools in clinically-or theoreticallyfocused phonetic studies that utilize acoustic properties as a dependent measure. For example, voice onset time, a key feature distinguishing voiced and voiceless consonants across languages [3], is important both in ASR [4], clinical [5], and theoretical studies [6]. ...
Preprint
We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks. We propose a neural architecture that is composed of two modules trained jointly: a recurrent neural network (RNN) module and a structured prediction model. The RNN outputs are considered as feature functions to the structured model. The overall model is trained with a structured loss function which can be designed to the given segmentation task. We demonstrate the effectiveness of our method by applying it to two simple tasks commonly used in phonetic studies: word segmentation and voice onset time segmentation. Results sug- gest the proposed model is superior to previous methods, ob- taining state-of-the-art results on the tested datasets.
... It is possible that a 10ms aspiration is insufficient to indicate voicelessness for the /p, b/ contrast. However, if the VOT boundary lengthens as the places of articulation move deeper into the vocal cavity [21], /p, b/ should have a shorter VOT boundary than /t, d/. Therefore, we can assume that it is the closure duration that is insufficient to indicate voicelessness in the /p, b/ contrast. ...
... In initial position, Dhuwaya stops are all voiceless unaspirated (see [28]). Their voice onset time is in a similar range as English /b d g/, although a bit longer. ...
Conference Paper
Full-text available
The Dhuwaya language has a stop inventory with a rich set of place contrasts, and a very limited 'strength' contrast that is only found intervocalically in retroflex and (marginally) alveo-lar stops. The Dhuwaya orthography is overall quite transparent , but the orthography is inherited from a related language with a less limited 'strength' contrast, and for this reason there are 12 stop graphemes but only seven or eight stop phonemes, which causes difficulty in early literacy acquisition. In this study, we provide a first pass at an acoustic description of the Dhuwaya stops, and shows that while the stop graphemes do not all signify distinctive phonemes, they are consistently pho-netically cued: sounds written with etc. have a short closure while sounds written with have a long closure.
... Infants show an asymmetry between consonants and vowels, losing sensitivity to non-native vowel contrasts by eight months (Kuhl et al., 1992;Bosch and Sebastián-Gallés, 2003) but to non-native consonant contrasts only by 10-12 months (Werker and Tees, 1984). The observed ordering is somewhat puzzling when one considers the availability for distributional information (Maye et al., 2002), which is much stronger for stop consonants than for vowels (Lisker and Abramson, 1964;Peterson and Barney, 1952). Infants are also conservative in generalizing across phonetic variability, showing a delayed abil-ity to generalize across talkers, affects, and dialects. ...
... Speech perception relies on different auditory components, including spectral and temporal phonetic cues, linguistic factors, and the contextual information. Among these, voice onset time (VOT) i.e. the duration between the release of a stop consonant and the onset of the following voiced segment, remains a critical cue for speech perception in most of the languages [1,2]. However, hearing impairment can disrupt temporal processing, thereby affecting the perception of VOT. ...
Article
Objectives: Voice onset time (VOT) has been identified as a potential temporal cue for predicting children’s performance in speech-in-noise tasks, yet the relationship between these two factors has never been explored among children using CI. Hence, the present study aimed to explore the performance of children using CI on temporal cue-based syllable categorization test and speech perception in noise and examine the relationship between the two. Methods: Temporal cue-based syllable categorization test was developed with the manipulation of /ba/ sound in 10 steps continuum with VOT varied between 74 ms to 26 ms. The developed test and revised speech in noise for Marathi-speaking children (0 and 5 dB SNR) were administered to thirty children with unilateral cochlear implant and thirty children with normal hearing, aged between 5 to 7 years. Results: The Mann-Whitney U test showed significant differences between groups in temporal cue-based cate gorization and speech in noise tests at 0 dB and 5 dB SNR. Kendall Tau B revealed a moderate correlation be tween implant age and scores on the temporal cue-based categorization and speech in noise tests at 0 dB SNR, with a strong correlation at 5 dB SNR. Additionally, there was a significant moderate relationship between temporal cue-based categorization and speech in noise test scores at both 0 dB and 5 dB SNR. Conclusion: The present study highlights the importance of temporal cues in speech perception and the need for temporal processing for children using cochlear implants. It reinforces the evidence that speech perception skills improve with implant age.
... The crosslinguistic transfer between L2 learners' first language (L1) and the target L2 is evidenced by research on L2 speech sounds. For instance, aspirating languages, such as Mandarin Chinese and English, contrast the stop consonants with [±aspiration] feature, while voicing languages like Spanish and Italian, show a [±voice] contrast (Lisker & Abramson, 1964). Accordingly, L1 speakers (L1ers) of aspirating languages produce [+voice] as [-aspiration] in voicing L2 (Feng & Busà, 2022;Li & Ye, 2022;Liu, 2016;, while voicing L1ers assimilate [+aspiration] to [-voice] in aspirating L2 (Flege & Eefting, 1987;Gorba & Cebrian, 2021;Li et al., 2021;Xi et al., 2020). ...
Article
Full-text available
This study dynamically models the nuclear contours of L2 Spanish sentences produced by 16 Mandarin-speaking learners and 9 Spanish natives, using a speech corpus obtained from a discourse completion task. The target sentences included statements (broad, categorical, and corrective focus) and yes/no questions (information-seeking, confirmation-seeking, and tag questions). Our results indicate that Chinese students (a) could not produce correct nuclear configurations to differentiate between the categorical statement and corrective focus; (b) produced a significantly higher pitch for nuclear pitch accent and lower pitch for unstressed syllables compared to Spanish natives; and (c) may have produced incorrect boundary tones influenced by the lexical stress positions of the nuclear word. This study contributes empirical evidence to the theory of L2 prosodic learning and highlights the importance of fine-grained phonetic details beyond phonological (dis)similarities between learners' L1 and L2 prosody. Furthermore, the observed difficulties in L2 prosody among even experienced learners highlight the need for proper prosodic training paradigms in teaching practice.
... Extensive evidence now exists, however, indicating that phonetic implementation, also referred to as phonetic realization, differs substantially across languages and dialects (Lisker & Abramson 1964, Disner 1983, Gordon et al. 2002, Fuchs & Toda 2010, Reidy 2016. For instance, the precise phonetic realization of a speech sound like [s] results in a higher peak frequency in English than in Japanese (Reidy 2016) and varies more generally from language to language (Gordon et al. 2002, Li et al. 2007, Fuchs & Toda 2010; it also varies by gender beyond any anatomical explanation (Heffernan 2004), sexual orientation (Linville 1998), and socioeconomic status (Stuart-Smith et al. 2003). ...
Article
Full-text available
Understanding the range and limits of crosslinguistic variation stands at the core of linguistic typology and basic science. Linguistic typology is concerned with the relevant dimensions along which languages can vary and those along which they remain stable; an overarching goal is to understand the cognitive, physical, social, and historical factors that shape language. Phonetics is no exception to this enterprise, but it has faced obstacles in crosslinguistic data collection and processing power. The field has nevertheless established a solid foundation regarding the relevant dimensions of stability, revealing strong phonetic tendencies across languages (i.e., universals). This article provides an overview of phonetic universals with a summary of previously attested descriptive and analytic phonetic universals and consideration of methodological aspects when investigating phonetic universals. The increasing availability of multilingual speech data along with advanced speech processing tools promises a new era for investigations into crosslinguistic phonetic variation and systematicity.
Article
Full-text available
Uzbek (ISO 639-1: uz) is a Turkic language spoken mainly in Uzbekistan, where the language is accorded the ‘state language’ status (Figure 1). Outside Uzbekistan, ethnic Uzbek populations are scattered across and beyond Central Asia in such countries as Afghanistan, Tajikistan, Kyrgyzstan, Kazakhstan, China, and Saudi Arabia (Balcı, 2004; Yakup, 2020:411). Many Uzbeks in the diaspora speak one or more languages in addition to Uzbek for interethnic communication (Naby, 1984:11). Some ethnic Uzbek communities are reportedly being linguistically assimilated to ethnic groups that are dominant in their countries or regions (Shalinsky, 1979:12–13; Fevzi, 2013:256; Yıldırım, 2019:64). It is therefore unclear exactly what proportion of ethnic Uzbeks retain Uzbek as their first language today. In the case of ethnic Uzbeks in Xinjiang in China, gauging the extent of linguistic assimilation can be difficult because of the limited range of contrasting features that exist between their variety of Uzbek and Uyghur, the interethnic language of Xinjiang, with which it is generally mutually intelligible (Cheng & Abudureheman, 1987:1–2). The varieties of Uzbek spoken in Afghanistan and China have developed autonomously from those spoken within the borders of the former Soviet Union, and hence differ from the present-day standard Uzbek of Uzbekistan, a former Soviet republic, most notably in lexica but also in phonology, morphology, and syntax (Jarring, 1938; Abdullaev, 1979: Reichl, 1983; Cheng & Abudureheman, 1987; Hayitov et al., 1992:36; Gültekin, 2010).
Chapter
This entry provides an overview of speech analysis software, also known as Acoustic Analysis Software Packages (AASPs), for world Englishes research. The entry first introduces AASPs, with an overview of waveforms and spectrograms, visual representations of the sound spectrum of speech commonly generated in AASPs for speech analysis. The entry then outlines the benefits and drawbacks of using AASPs for speech research. An overview of the most commonly used AASPs, including Praat, WaveSurfer, and Audacity, follows. The entry then discusses the application of AASPs to the analysis of phonetic and phonological features of world Englishes. The entry concludes with a summary of the main benefits and uses of AASPs for world Englishes research.
Article
Full-text available
The two experiments described in this paper were designed to investigate further the phenomenon called motor-motor adaptation. In the first investigation, subjects were adapted while noise was presented through headphones, which prevented them from hearing themselves. In the second experiment, subjects repeated an isolated vowel, as well as a consonant-vowel syllable which contained a stop consonant. The findings indicated that motor-motor adaptation is not a product of perceptual adaptation, and it is not a result of subjects producing longer voice onset times after adaptation to a voiced consonant rather than shorter voice onset times after adaptation to a voiceless consonant.
Article
Full-text available
Voice onset times of /d/ and /t/ were measured for 16 adult subjects (age range 21 to 26 years) under conditions of sobriety and intoxication. Subjects consumed beer to reach intoxication levels between 0.075 and 0.100% as measured using a portable breathalyzer test. Analysis indicated consistent variabilities over time for each subject and resistance of VOT variability to alcohol influence.
Article
Full-text available
This study presents a brief investigation into sex differences of speakers in the voice onset time of English plosives that are stressed in both word-initial and prevocalic position. 72 short phrases were presented to 5 men (range 25 to 37 years, mean age 34.2 yr.) and five women speakers (range 28 to 38 years, mean 32.6 yr.). Analysis showed that the women as speakers had on average, longer voice onset time values than their male peers.
Chapter
This chapter illustrates how an online speech corpus can be employed in educational contexts to enhance phonemic awareness, stimulate research curiosity, and facilitate data-driven learning. The study examines the production and substitution patterns of voiced and voiceless dental fricatives among native Turkish speakers of English, while investigating the influences of age of acquisition and length of residence. Phonemic substitution patterns were analyzed using speech samples from 24 speakers obtained from an online corpus. The findings indicate that native Turkish speakers primarily substitute [t] for /θ/ and [d] for /ð/, with occasional substitutions of [n] for the voiced fricative and [s] for the voiceless one. Participants experienced difficulties with /θ/ in consonant clusters but performed better with /ð/ when it occurred between vowels. The results suggest that early language acquisition and extended residence do not consistently lead to native-like production. Additionally, an analysis of voice onset times was conducted to further evaluate the nature of these substitutions.
Article
Full-text available
This study examined phonatory-articulatory timing during sung productions by trained and untrained female singers with and without singing talent. 31 untrained female singers were divided into two groups (talented or untalented) based on the perceptual judgments of singing talent by two experienced vocal instructors. In addition to the untrained singers, 24 trained female singers were recorded singing America the Beautiful, and voice onset time was measured for selected words containing /p, b, g, k/. Univariate analyses of variance indicated that phonatory-articulatory timing, as measured with voice onset time, was different among the three groups for /g/, with the untrained-untalented singers displaying longer voice onset time than the trained singers. No other significant differences were observed across the other phonemes. Despite a significant difference observed, relatively small effect sizes and statistical power make it difficult to draw any conclusions regarding the usefulness of voice onset time as an indicator of singing talent.
Article
Full-text available
This study investigates the voice onset time (VOT) of stops in Bahdini Kurdish, which are characterized by a three-way laryngeal contrast of voiceless unaspirated, voiceless aspirated and voiced stops. Thirty native speakers read a forty-word list three times, which included three examples of each stop in pre-vocalic onset position. Words were chosen based on specific contextual factors to account for place of articulation, laryngeal state, following vowel height, and length contrasts. The findings show that VOT distinguishes stop categories in Kurdish, with voicing lead indicating voiced stops, short lag for voiceless unaspirated stops and long lag for voiceless aspirated stops. Results of the linear mixed-effects model show that laryngeal state, place of articulation, following vowel height and length had significant effects on VOT. The gender of the participants, however, showed no significant effect on VOT. In line with most research on the effect of place of articulation on VOT, in voiceless aspirated stop categories, bilabials had the shortest VOT, followed by dentals and velars. Voiceless unaspirated bilabials had the shortest VOT values, followed by dentals, uvulars and then velars. Voiced stops do not show such a pattern. These results are compatible with other research on Indo-Iranian languages with three-way laryngeal categories.
Article
Acoustic-phonetic perception refers to the ability to perceive and discriminate between speech sounds. Acquired impairment of acoustic-phonetic perception is known historically as ‘pure word deafness’ and typically follows bilateral lesions of the cortical auditory system. The extent to which this deficit occurs after unilateral left hemisphere damage and the critical left hemisphere areas involved are not well defined. We tested acoustic-phonetic perception in 73 individuals with chronic left hemisphere stroke and performed multivariate lesion-symptom mapping incorporating controls for non-specific task confounds, pure tone hearing loss, response bias and lesion size. Separate analyses examined place of articulation, manner of articulation, voicing and vowel discriminations. Overlap of the lesion map with transcallosal pathways linking left and right temporal lobes was examined using a probabilistic diffusion tensor tractography map of these pathways obtained from a healthy control cohort. Compared to an age- and education-matched control sample, 18% of the patients had impaired acoustic-phonetic perception overall, with 44% impaired on voicing, 26% on manner, 15% on place and 14% on vowel discrimination. Lesion-symptom mapping revealed the most critical areas to be the transverse temporal gyrus (TTG) and adjacent medial belt cortex, the acoustic radiation and the posterior superior temporal sulcus (pSTS). There were notable differences between lesion correlates for the different types of discrimination, with place discrimination linked to medial TTG, vowel discrimination to lateral TTG and planum temporale, manner discrimination to posterior planum temporale and voicing discrimination to pSTS. Overlap of the main lesion map with transcallosal temporal lobe pathways was minor but included a deep white matter component at the base of the middle and inferior temporal gyri. The extent of overlap between individual lesions and the transcallosal pathway map was not correlated with acoustic-phonetic perception. The results add further evidence that acoustic-phonetic impairments, particularly impairments of voicing perception, are relatively common after unilateral left temporal lobe damage, and they clarify the lesion correlates of these deficits. Differences between the lesion maps for the discrimination types likely reflect differential reliance on spectral versus temporal analysis for these discriminations.
Article
This study investigates whether listeners’ cue weighting predicts their real‐time use of asynchronous acoustic information in spoken word recognition at both group and individual levels. By focusing on the time course of cue integration, we seek to distinguish between two theoretical views: the associated view (cue weighting is linked to cue integration strategy) and the independent view (no such relationship). The current study examines Seoul Korean listeners’ ( n = 62) weighting of voice onset time (VOT, available earlier in time) and onset fundamental frequency of the following vowel (F0, available later in time) when perceiving Korean stop contrasts (Experiment 1: cue‐weighting perception task) and the timing of VOT integration when recognizing Korean words that begin with a stop (Experiment 2: visual‐world eye‐tracking task). The group‐level results reveal that the timing of the early cue (VOT) integration is delayed when the later cue (F0) serves as the primary cue to process the stop contrast, supporting a relationship between cue weighting and the timing of cue integration (the associated view). At the individual level, listeners with greater reliance on F0 than VOT exhibited a further delayed integration of VOT. These findings suggest that the real‐time processing of asynchronously occurring acoustic cues for lexical activation is modulated by the weight that listeners assign to those cues, providing evidence for the associated view of cue integration. This study offers insights into the mechanisms of cue integration and spoken word recognition, and they shed light on variability in cue integration strategies among listeners.
Article
The properties of speech bursts of closure are studied using the material of a database of 39 speakers containing single-digit and multi-digit numerals with parallel recording of signals on a telephone handset and a directional microphone. Speech burst detection is performed by a short-term and long-term detector of spectral-temporal inhomogeneities, as well as a detector of the similarity measure of the eigenfunctions of the consonant burst spectrum and the current spectrum of the speech burst. The probability of the presence of a voiced or voiceless closure is estimated in the spaces of the amplitude spectrum and the spectrum of the group delay by the ratio of energy in the high and low frequency ranges. The place of articulation of a back-lingual consonant affects the probability distributions of the duration of the interval between the onset of a speech burst and the onset of a vowel, the frequency of the peak with maximum amplitude in the high-frequency region, the ratio of the energy in the high- and low-frequency region of the speech burst spectrum, and the similarity measures of the eigenfunctions of the consonant burst spectrum and the current spectrum of the speech burst.
Article
Aims How bilinguals control multiple languages is the object of intense recent scientific debate. Empirical research on language control at various linguistic levels has remained scarce, with language control at the phonetic level particularly underexplored. The present study aimed to examine the dynamics of phonetic-level language control during speech production. Design Chinese-English-German speakers named the letter of the alphabet in English (L2) or German (L3), either in single-language blocks or in alternate-language mixed blocks. Letters vary regarding how phonetically similar pronunciation is across the two languages, hence allowing to explore cross-language phonetic influences. Data and analysis Three-way repeated-measures analysis of variance (ANOVA) with trial type (non-switch vs. single-language for mixing costs; non-switch vs. switch for switch costs), response language (English/L2 vs. German/L3), and phonetic similarity (similar vs. neutral vs. different) as variables were conducted on 52 subjects’ response times and accuracy for mixing costs and switch costs, respectively. Findings Results showed substantial mixing and switch costs, as well as a “reversed language dominance” effect, suggesting inhibitory control in response to cross-language phonetic interference. Cross-language facilitation was observed for phonetically similar letters, and mixing/switch costs were modulated by phonetic similarity in a complex pattern. Originality The findings show a complex interplay of suppression (e.g., as indexed by switch costs) and facilitation (i.e., the effect of phonetic similarity between letter translation equivalents). Significance The evidence for cross-language phonetic interference as well as facilitation effects at local and global levels of control implies a dynamic interaction between the two phonetic systems.
Chapter
Bilingualism and the study of speech sounds are two of the largest areas of inquiry in linguistics. This Handbook sits at the intersection of these fields, providing a comprehensive overview of the most recent, cutting-edge work on the sound systems of adult and child bilinguals. Bringing together contributions from an international team of world-leading experts, it covers all aspects of the speech perception, production and processing of bilingual individuals, as well as surveying cross-linguistic influences on the phonetics and phonology of bilingualism. The thirty-five chapters are divided into thematic areas covering the theoretical foundations and methodological approaches employed to investigate bilingual speech, overviews of major findings and developments in child and adult bilingual phonology and phonetics, descriptions of the major areas of research within the speech perception, production and processing of the bilingual individual, and examinations of various predictors of cross-linguistic influence and variables affecting the outcomes of bilingual speech.
Chapter
Bilingualism and the study of speech sounds are two of the largest areas of inquiry in linguistics. This Handbook sits at the intersection of these fields, providing a comprehensive overview of the most recent, cutting-edge work on the sound systems of adult and child bilinguals. Bringing together contributions from an international team of world-leading experts, it covers all aspects of the speech perception, production and processing of bilingual individuals, as well as surveying cross-linguistic influences on the phonetics and phonology of bilingualism. The thirty-five chapters are divided into thematic areas covering the theoretical foundations and methodological approaches employed to investigate bilingual speech, overviews of major findings and developments in child and adult bilingual phonology and phonetics, descriptions of the major areas of research within the speech perception, production and processing of the bilingual individual, and examinations of various predictors of cross-linguistic influence and variables affecting the outcomes of bilingual speech.
Chapter
Bilingualism and the study of speech sounds are two of the largest areas of inquiry in linguistics. This Handbook sits at the intersection of these fields, providing a comprehensive overview of the most recent, cutting-edge work on the sound systems of adult and child bilinguals. Bringing together contributions from an international team of world-leading experts, it covers all aspects of the speech perception, production and processing of bilingual individuals, as well as surveying cross-linguistic influences on the phonetics and phonology of bilingualism. The thirty-five chapters are divided into thematic areas covering the theoretical foundations and methodological approaches employed to investigate bilingual speech, overviews of major findings and developments in child and adult bilingual phonology and phonetics, descriptions of the major areas of research within the speech perception, production and processing of the bilingual individual, and examinations of various predictors of cross-linguistic influence and variables affecting the outcomes of bilingual speech.
Chapter
Bilingualism and the study of speech sounds are two of the largest areas of inquiry in linguistics. This Handbook sits at the intersection of these fields, providing a comprehensive overview of the most recent, cutting-edge work on the sound systems of adult and child bilinguals. Bringing together contributions from an international team of world-leading experts, it covers all aspects of the speech perception, production and processing of bilingual individuals, as well as surveying cross-linguistic influences on the phonetics and phonology of bilingualism. The thirty-five chapters are divided into thematic areas covering the theoretical foundations and methodological approaches employed to investigate bilingual speech, overviews of major findings and developments in child and adult bilingual phonology and phonetics, descriptions of the major areas of research within the speech perception, production and processing of the bilingual individual, and examinations of various predictors of cross-linguistic influence and variables affecting the outcomes of bilingual speech.
Preprint
Full-text available
This paper investigates the role of F2 and VOT in realization of the contrast in emphasis among speakers of Arabic varieties of the Levant (Lebanese, Syrian) and the Gulf (Saudi, Qatari). The results show that the two dialect groups systematically differ in acoustic realization of plain and emphatic voiceless stops. While Lebanese and Syrian varieties reveal the traditional pattern, in which the contrast is predominantly realized as a difference in F2 (Plain: 1808 Hz, Emphatic: 1097 Hz), Qatari and Saudi ones demonstrate a pattern with VOT as the main acoustic correlate. Plain [t] is produced with aspiration (M = 72 ms), and emphatic [tˁ] is unaspirated (M = 17 ms). The difference in F2 in the Gulf speech is, in contrast, smaller: low vowel [aː] is back in both contexts, with more retraction in the emphatic context (Plain: 1230 Hz; Emphatic: 1108 Hz).
ResearchGate has not been able to resolve any references for this publication.