Article

Use of semantic context and F0 contours by older listeners during Mandarin speech recognition in quiet and single-talker interference conditions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This study followed up Wang, Shu, Zhang, Liu, and Zhang [(2013). J. Acoust. Soc. Am. 34(1), EL91–EL97] to investigate factors influencing older listeners' Mandarin speech recognition in quiet vs single-talker interference. Listening condition significantly interacted with F0 contours but not with semantic context, revealing that natural F0 contours provided benefit in the interference condition whereas semantic context contributed similarly to both conditions. Furthermore, the significant interaction between semantic context and F0 contours demonstrated the importance of semantic context when F0 was flattened. Together, findings from the two studies indicate that aging differentially affects tonal language speakers' dependence on F0 contours and semantic context for speech perception in suboptimal conditions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Furthermore, when a degraded speech signal is presented against interference, listeners benefit even more from semantic context. For example, the intelligibility difference between acoustically degraded sentences and semantically unrelated words is much greater when they are presented in suboptimal listening backgrounds than in quiet, indicating that listeners rely more on the top-down semantic context to aid speech recognition and comprehension in adverse conditions [16][17][18]. ...
... Specifically, to assess the role of contextual semantic integration in speech recognition, we introduced acoustic manipulations in two speech-in-noise conditions, one with natural F 0 contours kept in the target sentences presented against interfering background speech, and the other with flattened F 0 contours that disrupted the critical cue for Chinese lexical tones for proper word recognition. This speech-in-noise test protocol with Mandarin Chinese materials was previously adopted in a number of studies on elementary school students, middle school students, and adults, including the elderly population [16,18,25], and the results demonstrate that greater auditory semantic integration at the sentence level is required to recognize the words in the F 0 -degraded condition. Furthermore, two statistical models, i.e., the multiplicative model (product of the two subskills) and additive model (sum of the two subskills), were tested to clarify how the subskills would predict the variance in reading comprehension. ...
... In the present study, the extent of semantic integration needed for recognizing the target spoken sentences was manipulated by presenting two types of stimuli. Specifically, in suboptimal listening conditions, greater auditory semantic integration is required for successful recognition of speech with flattened F 0 contours compared with speech with nature F 0 contours [16,18]. Interestingly, our results show that recognition of speech with flattened F 0 contours contributed to reading comprehension, while the contribution decreased dramatically when the natural F 0 contours in the target spoken sentences were intact. ...
Article
Full-text available
Theories of reading comprehension emphasize decoding and listening comprehension as two essential components. The current study aimed to investigate how Chinese character decoding and context-driven auditory semantic integration contribute to reading comprehension in Chinese middle school students. Seventy-five middle school students were tested. Context-driven auditory semantic integration was assessed with speech-in-noise tests in which the fundamental frequency (F0) contours of spoken sentences were either kept natural or acoustically flattened, with the latter requiring a higher degree of contextual information. Statistical modeling with hierarchical regression was conducted to examine the contributions of Chinese character decoding and context-driven auditory semantic integration to reading comprehension. Performance in Chinese character decoding and auditory semantic integration scores with the flattened (but not natural) F0 sentences significantly predicted reading comprehension. Furthermore, the contributions of these two factors to reading comprehension were better fitted with an additive model instead of a multiplicative model. These findings indicate that reading comprehension in middle schoolers is associated with not only character decoding but also the listening ability to make better use of the sentential context for semantic integration in a severely degraded speech-in-noise condition. The results add to our better understanding of the multi-faceted reading comprehension in children. Future research could further address the age-dependent development and maturation of reading skills by examining and controlling other important cognitive variables, and apply neuroimaging techniques such as functional magmatic resonance imaging and electrophysiology to reveal the neural substrates and neural oscillatory patterns for the contribution of auditory semantic integration and the observed additive model to reading comprehension.
... Furthermore, when degraded speech signal is presented against interference, listeners benefit even more from semantic context. For example, the intelligibility difference between acoustically degraded sentences and semantically unrelated words is much greater when they are presented in suboptimal listening backgrounds than in quiet, indicating that listeners rely more on the top-down semantic context to aid speech recognition and comprehension in adverse conditions (Binns & Culling, 2007;Wang et al., 2013;Jiang, Li, Shu, Zhang, & Zhang, 2017). ...
... Specifically, to assess the role of contextual semantic integration in speech recognition, we introduced acoustic manipulations in two speech-in-noise conditions, one with natural F0 contours kept in the target sentences presented against interfering background speech and the other with flattened F0 contour that disrupted the critical cue for Chinese lexical tones for proper word recognition. This speech-in-noise test protocol with Mandarin Chinese materials had been previously adopted in a number of studies on elementary school students, middle school students, and adults including the elderly population (Wang et al., 2013;Jiang et al., 2017;Zhou et al, 2017), and the results demonstrated that greater auditory semantic integration at the sentence level is required to recognize the words in the F0-degraded condition. Furthermore, two statistical models, i.e., the multiplicative model (product of the two subskills) and additive model (sum of the two subskills), were tested to clarify how the subskills would predict the variance in reading comprehension. ...
... In the present study, the extent of semantic integration needed for recognizing the target spoken sentences was manipulated by presenting two types of stimuli. Specifically, in suboptimal listening conditions, greater auditory semantic integration is required for successful recognition of speech with flattened F0 contours compared with speech with nature F0 contours (Wang et al., 2013;Jiang, et al., 2017). Interestingly, our results showed that recognition of speech with flattened F0 contours contributed to reading comprehension, while the contribution decreased dramatically when the natural F0 contours in the target spoken sentences were intact. ...
Preprint
Full-text available
Theories of reading comprehension emphasize decoding and listening comprehension as two essential components. The current study aimed to investigate how Chinese character decoding and context-driven auditory semantic integration contribute to reading comprehension in Chinese middle school students. Seventy-five middle school students were tested. Context-driven auditory semantic integration was assessed with speech-in-noise tests in which the fundamental frequency (F0) contours of spoken sentences were either kept natural or acoustically flattened with the latter requiring a higher degree of contextual information. Statistical modelling with hierarchical regression was conducted to examine the contributions of Chinese character decoding and context-driven auditory semantic integration to reading comprehension. Performance on Chinese character decoding and auditory semantic integration scores with the flattened (but not natural) F0 sentences significantly predicted reading comprehension. Furthermore, the contributions of these two factors to reading comprehension were better fitted with an additive model instead of a multiplicative model. These findings indicate that reading comprehension in middle schoolers is associated with not only character decoding but also the listening ability to make better use of the sentential context for semantic integration in a severely degraded speech-in-noise condition. The results add to our better understanding of the multi-faceted reading comprehension in children. Future research could further address the age-dependent development and maturation of reading skills by examining and controlling other important cognitive variables, and apply neuroimaging techniques such as functional magmatic resonance imaging to reveal the neural substrates for the contribution of auditory semantic integration and the observed additive model to reading comprehension.
... Sentence recognition requires extra context-driven semantic, syntactic, and pragmatic processing, which is obviated in the recognition of isolated words. The intelligibility advantage of words in sentences over words in isolation when presented in adverse listening conditions has been confirmed by a number of studies on adults and children without dyslexia (e.g., Dubno et al., 2000;Wang et al., 2013;Jiang et al., 2017). However, only one study has examined sentence-in-noise recognition by dyslexic children to the best of our knowledge. ...
... The aim of the present study was to explore the effects of semantic context and F 0 contours on word recognition against interfering speech by comparing Chinese-speaking dyslexic children with chronological age-matched controls. The experiment followed the design of our previous speech intelligibility studies (Jiang et al., 2017;Zhou et al., 2017) by manipulating two factors, namely, semantic context (normal sentence versus wordlist sentence) and F 0 contours (utterances with natural versus flat contours). One possible outcome is that semantic context might be similarly used by dyslexic and non-dyslexic children to aid word recognition regardless of the acoustic manipulation. ...
... In tonal languages like Chinese, lexical tones are suprasegmental features, which are phonologically as important as segmental phonemes. As the primary acoustic correlates of Chinese lexical tones, natural F 0 contours play a very important role in word recognition, especially when speech is presented in suboptimal listening conditions or without contextual information (Patel et al., 2010;Wang et al., 2013;Jiang et al., 2017). Our previous study revealed that dyslexic children aged 9-11 years were able to use F 0 contours to identify Chinese lexical tones of isolated syllables presented in quiet although their recognition rates were significantly lower than those of agematched children without dyslexia (Zhang et al., 2012). ...
Article
Full-text available
Previous work has shown that children with dyslexia are impaired in speech recognition in adverse listening conditions. Our study further examined how semantic context and fundamental frequency (F0) contours contribute to word recognition against interfering speech in dyslexic and non-dyslexic children. Thirty-two children with dyslexia and 35 chronological-age-matched control children were tested on the recognition of words in normal sentences versus wordlist sentences with natural versus flat F0 contours against single-talker interference. The dyslexic children had overall poorer recognition performance than non-dyslexic children. Furthermore, semantic context differentially modulated the effect of F0 contours on the recognition performances of the two groups. Specifically, compared with flat F0 contours, natural F0 contours increased the recognition accuracy of dyslexic children less than non-dyslexic children in the wordlist condition. By contrast, natural F0 contours increased the recognition accuracy of both groups to a similar extent in the sentence condition. These results indicate that access to semantic context improves the effect of natural F0 contours on word recognition in adverse listening conditions by dyslexic children who are more impaired in the use of natural F0 contours during isolated and unrelated word recognition. Our findings have practical implications for communication with dyslexic children when listening conditions are unfavorable.
... For example, some studies concluded that the elderly used more context than, or at least as much as, their younger counterparts during speech recognition tests in quiet and in noise (Wingfield et al., 1994;Dubno et al., 2000;Sheldon et al., 2008). Other studies, such as Jiang et al. (Jiang et al., 2017), found that younger adults may use more context than older adults during sentence recognition in noise. ...
... This observation was supported by the PCU (Repeat) PI function that showed a steep negative slope at SNR < 0 dB. This also implied that CU (Repeat) relies on the demand for semantic context, and if the test condition was unfavorable, YA could use context equally or more than OA (Dubno et al., 2000;Aydelott et al., 2010;Jiang et al., 2017). ...
Article
Full-text available
Purpose To elucidate how aging would affect the extent of semantic context use and the reliance on semantic context measured with the Repeat–Recall Test (RRT). Methods A younger adult group (YA) aged between 18 and 25 and an older adult group (OA) aged between 50 and 65 were recruited. Participants from both the groups performed RRT: sentence repeat and delayed recall tasks, and subjective listening effort and noise tolerable time, under two noise types and seven signal-to-noise ratios (SNR). Performance–Intensity curves were fitted. The performance in SRT50 and SRT75 was predicted. Results For the repeat task, the OA group used more semantic context and relied more on semantic context than the YA group. For the recall task, OA used less semantic context but relied more on context than the YA group. Age did not affect the subjective listening effort but significantly affected noise tolerable time. Participants in both age groups could use more context in SRT75 than SRT50 on four tasks of RRT. Under the same SRT, however, the YA group could use more context in repeat and recall tasks than the OA group. Conclusion Age affected the use and reliance of semantic context. Even though the OA group used more context in speech recognition, they failed in speech information maintenance (recall) even with the help of semantic context. The OA group relied more on context while performing repeat and recall tasks. The amount of context used was also influenced by SRT.
... Our previous research, employing Mandarin Chinese speakers as participants, showed that there was interaction between sentence context and F0 contours during speech comprehension in quiet condition: normal sentences with flattened F0 contours were just as comprehended as normal sentences with natural F0 contours, syntactically/ semantically anomalous sentences with flattened F0 contours can not be as comprehended as syntactically/semantically anomalous sentences with natural F0 contours (Jiang et al., 2017;Wang et al., 2013;Xu et al., 2013;Zhang et al., 2016). However, the interaction was not significant in noise condition, for context can't plays work on top-down information to guide speech comprehension with noisy background . ...
... However, in syntactically/semantically anomalous sentences, flattening the F0 contours dramatically reduced the comprehension compared with the mild decrease in comprehension for normal sentences with natural F0 contours. In other words, people can use the context as a cue to comprehend the meaning of sentences, especially in the sentences with flattened F0 contours, this is consistent with our previous work (Jiang et al., 2017;Wang et al., 2013;Xu et al., 2013;Zhang et al., 2016). ...
Article
Sentence context and fundamental frequency (F0) contours are important factors to speech perception and comprehension. In Chinese-Mandarin, lexical tones can be distinguished by the F0 contours. Previous studies found healthy people could use the cue of context to recover the phonological representations of lexical tones from the altered tonal patterns to comprehend the sentences in quiet condition, but can not in noise environment. Lots of research showed that patients with schizophrenia have deficits of speech perception and comprehension. However, it is unclear how context and F0 contours influence speech perception and comprehension in patients with schizophrenia. This study detected the contribution of context and lexical tone to sentence comprehension in four types of sentences by manipulating the context and F0 contours in 32 patients with schizophrenia and 33 healthy controls. The results showed that (1) in patients with schizophrenia, the interaction between context and F0 contour was not significant, which was significant in healthy controls; (2) the scores of sentences with two types of sentences with flattened F0 contours were negatively correlated with hallucination trait scores; (3) the patients with schizophrenia showed significantly lower scores on the intelligibility of sentences in all conditions, which were negatively correlated with PANSS-P. The patients with schizophrenia couldn't use the cue of context to recover the phonological representations of lexical tones from the altered tonal patterns when they comprehend the sentences, inner noise may be the underlying mechanism for the deficits of speech perception and comprehension.
... Specifically, the target sentences included normal/word list sentences with natural/flat F 0 contours, respectively, manipulating semantic context and F 0 contours. Such manipulations have been adopted in our previous studies Zhang et al., 2016;Jiang et al., 2017). The normal sentences were 28 declarative Chinese sentences with each sentence comprised of 3 to 6 words (2-4 content words plus 0-2 functional words) that were familiar to both the elementary and middle school children. ...
... By contrast, both age groups of children benefited to a similar extent from semantic context when the listening background changes from quiet to interference condition, which indicates that the elementary and middle school children are capable of using semantic context to resist interfering speech. Speech recognition by young and older adults in quiet and single-talker interference backgrounds has been explored in our previous studies (Wang et al., 2013 for young adults, andJiang et al., 2017 for older listeners). Although it is unjustifiable to make a direct statistical comparison across the results of different age groups because different word materials and elicitation approaches were adopted, it is meaningful to compare the different patterns of interaction effects between semantic context and F 0 contours/listening background across the four age groups. ...
Article
Full-text available
The goal of this developmental speech perception study was to assess whether and how age group modulated the influences of high-level semantic context and low-level fundamental frequency (F 0) contours on the recognition of Mandarin speech by elementary and middle-school-aged children in quiet and interference backgrounds. The results revealed different patterns for semantic and F 0 information. One the one hand, age group modulated significantly the use of F 0 contours, indicating that elementary school children relied more on natural F 0 contours than middle school children during Mandarin speech recognition. On the other hand, there was no significant modulation effect of age group on semantic context, indicating that children of both age groups used semantic context to assist speech recognition to a similar extent. Furthermore, the significant modulation effect of age group on the interaction between F 0 contours and semantic context revealed that younger children could not make better use of semantic context in recognizing speech with flat F 0 contours compared with natural F 0 contours, while older children could benefit from semantic context even when natural F 0 contours were altered, thus confirming the important role of F 0 contours in Mandarin speech recognition by elementary school children. The developmental changes in the effects of high-level semantic and low-level F 0 information on speech recognition might reflect the differences in auditory and cognitive resources associated with processing of the two types of information in speech perception.
... However, when multi-speaker babble noise was added, listeners were less accurate in transcribing the monotone sentences compared to sentences with tones intact. Further research has shown that flattened tones may have even stronger impacts on elderly or hearing impaired Mandarin listeners (Jiang, Li, Shu, Zhang, and Zhang 2017). These lines of work suggest that similar difficulties would be likely for L2 speech, where tones are not just flattened, but often misleading. ...
Chapter
Full-text available
This chapter discusses second language pronunciation of Mandarin from the perspective of the native Mandarin speakers who listen to it. For such listeners, second language Mandarin often bears a noticeable foreign accent. I will provide a framework for defining foreign accent and for distinguishing accented pronunciation from pronunciation errors. I will then review the results of research related to foreign-accented Mandarin and how it affects listeners’ judgments, comprehension, and the efficiency with which they process second language Mandarin speech. Naturally, lexical tones will receive special attention in this discussion.
... Deep Learning is helping people to shape and model very complex problems. Just ten years ago no one could have predicted that today machines can surpass human-level performances in feature recognition and detection on an image (Uçar et al., 2017;Deng and Yu, 2014;Szegedy et al., 2013;Krizhevsky et al., 2012;Russakovsky et al., 2015;Berg et al., 2010;Yan et al., 2016), speech recognition (Higy et al., 2018;Schatz et al., 2018;Izumi et al., 2018;Jiang et al., 2017;Edwards et al., 2017) or mimic realistically the human voice (Michaely et al., 2017;Kleijn et al., 2017;van den Oord et al., 2017). ...
Conference Paper
On the day of writing this thesis, we know about 4000 exoplanets beyond the Solar System. We have a wide variety of known exoplanets, from very hot giant planets to cold Earths. Planetary detections, nevertheless, are not enough to thoroughly investigate the history and chemistry of the exoplanets. For this reason, atmospheric characterisation is becoming more critical than ever in exoplanetary science. In the next decade, many space missions such as JWST, Twinkle, ground-based instruments (ELT, TMT), and ARIEL, will study spectroscopically exoplanetary atmospheres and will help us examining more-in-depth planetary formation and dynamics. Today, the Hubble/WFC3 camera represents the state-of-art for transit spectroscopy. State of the art inverse models to interpret the observed exoplanetary spectra are based on Bayesian analysis, able to sample a sizeable parameter space and to converge on a possible real set of parameters that can explain the structure of exoplanetary spectra. In this thesis, I present the results obtained by applying the UCL Bayesian inverse model, TauREx, to the largest catalogue observed to date. I will demonstrate how it is possible to find water vapour in 16 out of 30 planets chosen from the WFC3 planetary dataset. Often the input spectra are too noisy to obtain a result statistically significant. For this reason, I will introduce the ADI (Atmospheric Detection Index) index, able to quantify the ``goodness'' and the significance of molecular detection. The use of complex atmospheric models on Bayesian analysis tools can require a prohibitive amount of time. For this reason, it is crucial to improve the analysis efficiency of complex atmospheres and accelerate their computations. To speed up the computation of atmospheric spectroscopic retrievals, I developed ExoGAN (Exoplanetary Generative Adversarial Network), a new-generation deep learning algorithm able to learn how to generate atmospheric spectra and retrieve the best set of parameters that can explain the observed spectrum. It consists of a deep convolutional generative adversarial network able to recognise molecular features, abundances and physical atmospheric parameters. Finally, after describing more ``traditional'' atmospheric retrieval tools, their optimisation using deep learning algorithm and their application to real data-sets (i.e. the HST/WFC3 camera), I introduce a possible target list of planet candidates for a space mission dedicated to transit spectroscopy: the ARIEL space mission. Target selection is a crucial task to optimally select the planets with different basic parameters to sample uniformly the whole orbital and physical parameters space. The generation of an optimal target list is highly dependent on the type of instrument, and it will critically influence the science return of the mission.
... There are four types of speech materials, which were created by respectively manipulating semantic context (normal sentence vs. word list) and F 0 contours (natural vs. flat; see Figure 1). Similar manipulations have been adopted in our previous studies (Jiang, Li, Shu, Zhang, & Zhang, 2017;Wang et al., 2013;H. Zhou et al., 2017). ...
Article
Purpose: The purpose of the current study was to investigate the extent to which semantic context and F0 contours affect speech recognition by Mandarin-speaking kindergarten-aged children with CIs. Method: The experimental design manipulated two factors, i.e., semantic context by comparing the intelligibility of normal sentence vs. word list, and F0 contours by comparing the intelligibility of utterances with natural vs. flat F0 patterns. Twenty- two children with cochlear implants completed the speech recognition test. Results: Children with cochlear implants could use both semantic context and F0 contours to assist speech recognition. Furthermore, natural F0 patterns provided extra benefit when semantic context was present than when it was absent. Conclusion: Dynamic F0 contours play an important role in speech recognition by Mandarin-speaking children with cochlear implants despite the well-known limitation of cochlear implant devices in extracting F0 information.
Preprint
Full-text available
Purpose: This study aimed to examine how aging and modifications of critical acoustic parameters may affect the perception of whispered speech as a degraded signal. Method: Forty Mandarin-speaking adults were included in the study. Part 1 of the study compared the perception of Mandarin lexical tones, vowels, and syllables in older and younger adults in whispered vs. phonated speech conditions. Parts 2 and 3 further examined how modification of duration and intensity cues contributed to the perceptual outcomes. Results: Perception of whispered tones was compromised in older and younger adults. Older adults identified lexical tones less accurately than their younger counterparts, particularly for phonated T2, T3 and whispered T3. Aging also negatively affected the vowel identification of /i, u/ in the whispered condition. Syllable-level accuracy was largely dependent on the accuracy of lexical tones and vowels. Furthermore, reduced duration led to the decreased accuracy of phonated T3 and whispered T2, T3 but increased accuracy of phonated T4. Reduced intensity lowered the recognition accuracy for phonated vowels /i, ɤ, o, y/ in older adults and /i, u/ in younger adults, and it also lowered the accuracy of whispered vowels /a, ɤ/ in older adults. Contrary to our expectation, increased duration and intensity did not improve older adults’ speech perception in either phonated or whispered conditions. Conclusion: The results suggest that aging adversely affected speech perception in both phonated and whispered conditions with more challenges in identifying whispered speech for older adults. While older adults’ diminished performance may be potentially due to problems with processing the degraded temporal and spectral information of the target speech sounds, it cannot be simply compensated for by increasing the duration and intensity of the target sounds beyond the audible level.
Article
Dynamic F0 contour plays an important role in recognizing speech. The present work examined the effect of F0 contour on speech intelligibility for hearing-impaired listeners for Mandarin Chinese in quiet, in steady noise, and in two-talker competing speech. The intelligibility of two types of natural speech was measured: single-Tone speech with relatively flat F0 contours and multi-Tone speech with time-varying F0 contours. The speech rate and mean F0 of speech materials were carefully controlled to avoid effects other than F0 contour on the speech intelligibility. Results showed that intelligibility was significantly higher for speech with a flat F0 contour than that with a dynamic F0 contour at a low signal-to-masker ratio in both speech-spectrum noise and two-talker masker.
Article
Full-text available
Mandarin Chinese speech sounds (vowels × tones) were presented to younger and older Chinese-native speakers with normal hearing. For the identification of vowel-plus-tone, vowel-only, and tone-only, younger listeners significantly outperformed older listeners. The tone 3 identification scores correlated significantly with the age of older listeners. Moreover, for older listeners, the identification rate of vowel-plus-tone was significantly lower than that of vowel-only and tone-only, whereas for younger listeners, there was no difference among the three identification scores. Therefore, aging negatively affected Mandarin vowel and tone perception, especially when listeners needed to process both phonemic and tonal information.
Article
Full-text available
A common complaint of older listeners is that they can hear speech, yet cannot understand it, especially when listening to speech in a background noise. When target and competing speech signals are concurrently presented, a difference in the fundamental frequency (ΔF0) between competing speech signals, which determines the pitch of voice, can be an important and commonly occurring cue to facilitate the separation of the target message from the interfering message, consequently improving intelligibility of the target message. To address the question of whether the older listeners have reduced ability to use ΔF0 and how the age-related deficits in the processing of ΔF0 are theoretically explained, this paper is divided into three parts. The first part of this article summarizes how the speech-communication difficulties that older listeners have are theoretically explained. In the second part, literatures on the perceptual benefits from ΔF0 and the age-related deficits on the use of ΔF0 are reviewed. As a final part, three theoretical models explaining the general processing of ΔF0 are compared to discuss which better explains the age-related deficits in the processing of ΔF0.
Article
Full-text available
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Article
Full-text available
This study examined the effects of lexical tone contour on the intelligibility of Mandarin sentence in quiet and in noise. A text-to-speech synthesis engine was used to synthesize Mandarin sentences with each word carrying the original lexical tone, flat tone, or a tone randomly selected from the four Mandarin lexical tones. The synthesized speech signals were presented to 11 normal-hearing listeners for recognition in quiet and in speech-shaped noise at 0 dB signal-to-noise ratio. Normal-hearing listeners nearly perfectly recognized the Mandarin sentences produced with modified tone contours in quiet; however, the performance declined substantially in noise. Consistent with previous findings to some extent, the present findings suggest that lexical tones are relatively redundant cues for Mandarin sentence intelligibility in quiet; and other cues could compensate the distorted lexical tone contours. However in noise, the results provide direct evidence that lexical tone contours are important for the recognition of Mandarin sentences.
Article
Full-text available
This chapter focuses on a set of attentional or executive control processes, all inhibitory, that operate in the service of an individual's goals to narrow and constrain the contents of consciousness to be goal relevant. An uncluttered or narrowly focused "working memory," rather than a large one, is the ideal processing system. The narrow focus maximizes the speed and accuracy of on-line processing because it reduces the likelihood of switching attention to goal-irrelevant representations. The work is similar to that of other investigations in its focus on executive processes as a critical source of working memory variation as well as variation in many cognitive domains. The emphasis on inhibitory processes may be the characteristic that most differentiates their work from others.
Article
Full-text available
Flattening the fundamental frequency (F0) contours of Mandarin Chinese sentences reduces their intelligibility in noise but not in quiet. It is unclear, however, how the absence of primary acoustic cue for lexical tones might be compensated with the top-down information of sentence context. In this study, speech intelligibility was evaluated when participants listened to sentences and word lists with or without F0 variations in quiet and noise. The results showed that sentence context partially explained the unchanged intelligibility of monotonous Chinese sentences in quiet and further indicate that F0 variations and sentence context act in concert during speech comprehension.
Article
Full-text available
This study tested the importance of F0 variation for tone language comprehension. The intelligibility of Mandarin sentences with natural F0 contours was compared to the intelligibility of monotone (flat-F0) sentences created via speech resynthesis. In a quiet background, flat-F0 speech was just as intelligible as natural speech (about 94% intelligible), highlighting the robustness of the language comprehension system. However, when babble noise was added (0 db SNR) flat-F0 speech was substantially less intelligible than natural speech (60% vs. 80% intelligible), indicating that F0 variation is very important for Mandarin sentence intelligibility in noise.
Article
Full-text available
Research on the exploitation of prosodic information in the comprehension of spoken language is reviewed. The research falls into three main areas: the use of prosody in the recognition of spoken words, in which most attention has been paid to the question of whether the prosodic structure of a word plays a role in initial activation of stored lexical representations; the use of prosody in the computation of syntactic structure, in which the resolution of global and local ambiguities has formed the central focus; and the role of prosody in the processing of discourse structure, in which there has been a preponderance of work on the contribution of accentuation and deaccentuation to integration of concepts with an existing discourse model. The review reveals that in each area progress has been made towards new conceptions of prosody's role in processing, and in particular this has involved abandonment of previously held deterministic views of the relationship between prosodic structure and other aspects of linguistic structure.
Article
Full-text available
Older adults are not as good as younger adults at decoding prosodic emotions. We sought to determine the specificity of this finding. Performance of older and younger adults was compared on a prosodic emotion task, a "pure" prosodic emotion task, a linguistic prosody task, and a "pure" linguistic prosody task. Older adults were less accurate at interpreting prosodic emotion cues and nonemotional contours, concurrent semantic processing worsened interpretation, and performance was further degraded when identifying negative emotions and questions. Older adults display a pervasive problem interpreting prosodic cues, but further study is required to clarify the stage at which performance declines.
Article
Full-text available
It is widely accepted that hearing loss increases markedly with age, beginning in the fourth decade ISO 7029 (2000). Age-related hearing loss is typified by high-frequency threshold elevation and associated reductions in speech perception because speech sounds, especially consonants, become inaudible. Nevertheless, older adults often report additional and progressive difficulties in the perception and comprehension of speech, often highlighted in adverse listening conditions that exceed those reported by younger adults with a similar degree of high-frequency hearing loss (Dubno, Dirks, & Morgan) leading to communication difficulties and social isolation (Weinstein & Ventry). Some of the age-related decline in speech perception can be accounted for by peripheral sensory problems but cognitive aging can also be a contributing factor. In this article, we review findings from the psycholinguistic literature predominantly over the last four years and present a pilot study illustrating how normal age-related changes in cognition and the linguistic context can influence speech-processing difficulties in older adults. For significant progress in understanding and improving the auditory performance of aging listeners to be made, we discuss how future research will have to be much more specific not only about which interactions between auditory and cognitive abilities are critical but also how they are modulated in the brain.
Article
Full-text available
Arcsine or angular transformations have been used for many years to transform proportions to make them more suitable for statistical analysis. A problem with such transformations is that the arcsines do not bear any obvious relationship to the original proportions. For this reason, results expressed in arcsine units are difficult to interpret. In this paper a simple linear transformation of the arcsine transform is suggested. This transformation produces values that are numerically close to the original percentage values over most of the percentage range while retaining all of the desirable statistical properties of the arcsine transform.
Article
Full-text available
Young and elderly adults heard recorded words that had been computer-edited from connected speech so as to be heard in isolation from their linguistic surround. Word identification was tested for words in isolation and when heard with increasing amounts of linguistic context that had either preceded or followed them in their original utterances. Although the elderly subjects were poorer in identifying the words in isolation compared to young adults, both age groups showed similar increases in correct word identification as increasing amounts of prior context were presented. By contrast, context that followed the target words was less effective for the elderly subjects than it was for the young. It is argued that a memory trace of the unclear stimulus must be maintained for effective utilization of following context in a retrospective analysis. The elderly subjects' relative inability to utilize following context implicates an age-related memory deficit operating at the sentence level.
Article
Full-text available
This study reports a meta-analysis comparing the size of semantic priming effects on young and older adults' lexical decision and pronunciation latency. The analysis included 15 studies with 49 conditions varying the semantic relatedness of a prime stimulus (single word or whole sentence) and a target word. An effect-size analysis on the difference between young and older adults' semantic priming effect (unrelated minus related latency) indicated that semantic priming effects are reliably larger for older than for young adults. There was no evidence for nonhomogeneity in this age difference across the different conditions. The relationship between young and older adults' semantic priming effects was described by a function with a positive intercept and a slope of 1.0. This pattern of findings favors aging models postulating process-specific slowing rather than general cognitive slowing.
Article
Full-text available
Word recognition in sentences with and without context was measured in young and aged subjects with normal but not identical audiograms. Benefit derived from context by older adults has been obscured, in part, by the confounding effect of even mildly elevated thresholds, especially as listening conditions vary in difficulty. This problem was addressed here by precisely controlling signal-to-noise ratio across conditions and by accounting for individual differences in signal-to-noise ratio. Pure-tone thresholds and word recognition were measured in quiet and threshold-shaped maskers that shifted quiet thresholds by 20 and 40 dB. Word recognition was measured at several speech levels in each condition. Threshold was defined as the speech level (or signal-to-noise ratio) corresponding to the 50 rau point on the psychometric function. As expected, thresholds and slopes of psychometric functions were different for sentences with context compared to those for sentences without context. These differences were equivalent for young and aged subjects. Individual differences in word recognition among all subjects, young and aged, were accounted for by individual differences in signal-to-noise ratio. With signal-to-noise ratio held constant, word recognition for all subjects remained constant or decreased only slightly as speech and noise levels increased. These results suggest that, given equivalent speech audibility, older and younger listeners derive equivalent benefit from context.
Article
Understanding speech in noise is one of the most complex activities encountered in everyday life, relying on peripheral hearing, central auditory processing, and cognition. These abilities decline with age, and so older adults are often frustrated by a reduced ability to communicate effectively in noisy environments. Many studies have examined these factors independently; in the last decade, however, the idea of the auditory-cognitive system has emerged, recognizing the need to consider the processing of complex sounds in the context of dynamic neural circuits. Here, we use structural equation modeling to evaluate interacting contributions of peripheral hearing, central processing, cognitive ability, and life experiences to understanding speech in noise. We recruited 120 older adults (ages 55 to 79) and evaluated their peripheral hearing status, cognitive skills, and central processing. We also collected demographic measures of life experiences, such as physical activity, intellectual engagement, and musical training. In our model, central processing and cognitive function predicted a significant proportion of variance in the ability to understand speech in noise. To a lesser extent, life experience predicted hearing-in-noise ability through modulation of brainstem function. Peripheral hearing levels did not significantly contribute to the model. Previous musical experience modulated the relative contributions of cognitive ability and lifestyle factors to hearing in noise. Our models demonstrate the complex interactions required to hear in noise and the importance of targeting cognitive function, lifestyle, and central auditory processing in the management of individuals who are having difficulty hearing in noise.
Article
Chinese is a tonal language in which variation in pitch is used to distinguish word meanings. Thus, in order to understand a word, listeners have to extract the pitch patterns in addition to its phonemes. Can the correct word meaning still be accessed in sentence contexts if pitch patterns of words are altered? If so, how is this accomplished? The present study attempts to address such questions with event-related functional magnetic resonance imaging (fMRI). Native speakers of Mandarin Chinese listened to normal and pitch-flattened (monotone) speech inside the scanner. The behavioral results indicated that they rated monotone sentences as intelligible as normal sentences, and performed equally well in a dictation test on the two types of sentences. The fMRI results showed that both types of sentences elicited similar activation in the left insular, middle and inferior temporal gyri, but the monotone sentences elicited greater activation in the left planum temporale (PT) compared with normal sentences. These results demonstrate that lexical meaning can still be accessed in pitch-flattened Chinese sentences, and that this process is realized by automatic recovery of the phonological representations of lexical tones from the altered tonal patterns. Our findings suggest that the details of spoken pitch patterns are not essential for adequate lexical-semantic processing during sentence comprehension even in tonal languages like Mandarin Chinese, given that listeners can automatically use additional neural and cognitive resources to recover distorted tonal patterns in sentences.
Article
Listening in noisy situations is a challenging experience for many older adults. The authors hypothesized that older adults exert more listening effort compared with young adults. Listening effort involves the attention and cognitive resources required to understand speech. The purpose was (a) to quantify the amount of listening effort that young and older adults expend when they listen to speech in noise and (b) to examine the relationship between self-reported listening effort and objective measures. A dual-task paradigm was used to objectively evaluate the listening effort of 25 young and 25 older adults. The primary task involved a closed-set sentence-recognition test, and the secondary task involved a vibrotactile pattern recognition test. Participants performed each task separately and concurrently under 2 experimental conditions: (a) when the level of noise was the same and (b) when baseline word recognition performance did not differ between groups. Older adults expended more listening effort than young adults under both experimental conditions. Subjective estimates of listening effort did not correlate with any of the objective dual-task measures. Older adults require more processing resources to understand speech in noise. Dual-task measures and subjective ratings tap different aspects of listening effort.
Article
Two experiments using the materials of the Revised Speech Perception in Noise (SPIN-R) Test [Bilger et al., J. Speech Hear. Res. 27, 32-48 (1984)] were conducted to investigate age-related differences in the identification and the recall of sentence-final words heard in a babble background. In experiment 1, the level of the babble was varied to determine psychometric functions (percent correct word identification as a function of S/N ratio) for presbycusics, old adults with near-normal hearing, and young normal-hearing adults, when the sentence-final words were either predictable (high context) or unpredictable (low context). Differences between the psychometric functions for high- and low-context conditions were used to show that both groups of old listeners derived more benefit from supportive context than did young listeners. In experiment 2, a working memory task [Daneman and Carpenter, J. Verb. Learn. Verb. Behav. 19, 450-466 (1980)] was added to the SPIN task for young and old adults. Specifically, after listening to and identifying the sentence-final words for a block of n sentences, the subjects were asked to recall the last n words that they had identified. Old subjects recalled fewer of the items they had perceived than did young subjects in all S/N conditions, even though there was no difference in the recall ability of the two age groups when sentences were read. Furthermore, the number of items recalled by both age groups was reduced in adverse S/N conditions. The resutls were interpreted as supporting a processing model in which reallocable processing resources are used to support auditory processing when listening becomes difficult either because of noise, or because of age-related deterioration in the auditory system. Because of this reallocation, these resources are unavailable to more central cognitive processes such as the storage and retrieval functions of working memory, so that "upstream" processing of auditory information is adversely affected.
Article
This study investigated factors that contribute to deficits of elderly listeners in recognizing speech that is degraded by temporal waveform distortion. Young and elderly listeners with normal hearing sensitivity and with mild-to-moderate, sloping sensorineural hearing losses were evaluated. Low-predictability (LP) sentences from the Revised Speech Perception in Noise test (R-SPIN) (Bilger, Nuetzel, Rabinowitz, & Rzeczkowski, 1984) were presented to subjects in undistorted form and in three forms of distortion: time compression, reverberation, and interruption. Percent-correct recognition scores indicated that age and hearing impairment contributed independently to deficits in recognizing all forms of temporally distorted speech. In addition, subjects' auditory temporal processing abilities were assessed on duration discrimination and gap detection tasks. Canonical correlation procedures showed that some of the suprathreshold temporal processing measures, especially gap duration discrimination, contributed to the ability to recognize reverberant speech. The overall conclusion is that age-related factors other than peripheral hearing loss contribute to diminished speech recognition performance of elderly listeners.
Article
This study is part of ongoing efforts to characterize and determine the neural bases of presbycusis. These efforts utilize humans and animals in sets of overlapping hypotheses and experiments. Here, 50 young adult and elderly subjects, with normal audiometric thresholds or high-frequency hearing loss, were presented three types of linguistic materials at suprathreshold levels to determine speech recognition performance in noise. The study sought to determine how peripheral and central auditory system dysfunctions might be implicated in the speech recognition problems of elderly humans. There were four main findings. (1) Peripheral auditory nervous system pathologies, manifested as reduced sensitivity for speech-frequency pure tones and speech materials, contribute to elevated speech reception thresholds in quiet, and to reduced speech recognition in noise. (2) Good cognitive ability was demonstrated in the old subjects who took advantage of supportive context as well or better than young subjects, strongly indicating that the cortical portions of the speech/language nervous system did not account for the speech understanding dysfunctions of the old subjects. (3) When audibility and cognitive functioning were not affected, the demonstrated speech-recognition in-noise dysfunction remained in old subjects. This implicates auditory brainstem or auditory cortex temporal-resolution dysfunctions in accounting for the observed differences in speech processing. (4) Performance differences between young and elderly subjects with elevated thresholds illustrate the effects of age plus hearing loss and thereby implicate both peripheral and central dysfunctions in presbycusics. This is because the differences in performance between young and elderly subjects with normal peripheral sensitivity identified a central auditory dysfunction.
Article
In 2 experiments, young and older adults heard target speech presented in quiet or with a competing speaker in the background. The distractor consisted either of meaningful speech or nonmeaningful speech composed of randomly ordered word strings (Experiment 1) or speech in an unfamiliar language (Experiment 2). Tests of recall for the target speech showed that older adults, but not younger adults, were impaired more by meaningful distractors than by nonmeaningful distracters. However, on a surprise recognition test, young adults were more likely than older adults to recognize meaningful distractor items. These results suggest that reduced efficiency in attentional control is an important factor in older adults' difficulty in recalling target speech in the presence of a background of competing speech.
Article
Positron emission tomography (PET) was used to investigate the neural basis of the comprehension of speech in unmodulated noise ("energetic" masking, dominated by effects at the auditory periphery), and when presented with another speaker ("informational" masking, dominated by more central effects). Each type of signal was presented at four different signal-to-noise ratios (SNRs) (+3, 0, -3, -6 dB for the speech-in-speech, +6, +3, 0, -3 dB for the speech-in-noise), with listeners instructed to listen for meaning to the target speaker. Consistent with behavioral studies, there was SNR-dependent activation associated with the comprehension of speech in noise, with no SNR-dependent activity for the comprehension of speech-in-speech (at low or negative SNRs). There was, in addition, activation in bilateral superior temporal gyri which was associated with the informational masking condition. The extent to which this activation of classical "speech" areas of the temporal lobes might delineate the neural basis of the informational masking is considered, as is the relationship of these findings to the interfering effects of unattended speech and sound on more explicit working memory tasks. This study is a novel demonstration of candidate neural systems involved in the perception of speech in noisy environments, and of the processing of multiple speakers in the dorso-lateral temporal lobes.
Article
A common complaint of many older adults is difficulty communicating in situations where they must focus on one talker in the presence of other people speaking. In listening environments containing multiple talkers, age-related changes may be caused by increased sensitivity to energetic masking, increased susceptibility to informational masking (e.g., confusion between the target voice and masking voices), and/or cognitive deficits. The purpose of the present study was to tease out these contributions to the difficulties that older adults experience in speech-on-speech masking situations. Groups of younger, normal-hearing individuals and older adults with varying degrees of hearing sensitivity (n = 12 per group) participated in a study of sentence recognition in the presence of four types of maskers: a two-talker masker consisting of voices of the same sex as the target voice, a two-talker masker of voices of the opposite sex as the target, a signal-envelope-modulated noise derived from the two-talker complex, and a speech-shaped steady noise. Subjects also completed a voice discrimination task to determine the extent to which they were able to incidentally learn to tell apart the target voice from the same-sex masking voices and to examine whether this ability influenced speech-on-speech masking. Results showed that older adults had significantly poorer performance in the presence of all four types of maskers, with the largest absolute difference for the same-sex masking condition. When the data were analyzed in terms of relative group differences (i.e., adjusting for absolute performance) the greatest effect was found for the opposite-sex masker. Degree of hearing loss was significantly related to performance in several listening conditions. Some older subjects demonstrated a reduced ability to discriminate between the masking and target voices; performance on this task was not related to speech recognition ability. The overall pattern of results suggests that although amount of informational masking does not seem to differ between older and younger listeners, older adults (particularly those with hearing loss) evidence a deficit in the ability to selectively attend to a target voice, even when the masking voices are from talkers of the opposite sex. Possible explanations for these findings include problems understanding speech in the presence of a masker with temporal and spectral fluctuations and/or age-related changes in cognitive function.
Article
Older adults are known to benefit from supportive context in order to compensate for age-related reductions in perceptual and cognitive processing, including when comprehending spoken language in adverse listening conditions. In the present study, we examine how younger and older adults benefit from two types of contextual support, predictability from sentence context and priming, when identifying target words in noise-vocoded sentences. In the first part of the experiment, benefit from context based on primarily semantic knowledge was evaluated by comparing the accuracy of identification of sentence-final target words that were either highly predictable or not predictable from the sentence context. In the second part of the experiment, benefit from priming was evaluated by comparing the accuracy of identification of target words when noise-vocoded sentences were either primed or not by the presentation of the sentence context without noise vocoding and with the target word replaced with white noise. Younger and older adults benefited from each type of supportive context, with the most benefit realized when both types were combined. Supportive context reduced the number of noise-vocoded bands needed for 50% word identification more for older adults than their younger counterparts.
  • Jiang
Jiang et al.: JASA Express Letters [http://dx.doi.org/10.1121/1.4979565] Published Online 3 April 2017
Distraction by competing speech in younger and older listeners
  • P A Tun
  • G O'kane
  • A Wingfield
Tun, P. A., O'Kane, G., and Wingfield, A. (2002). "Distraction by competing speech in younger and older listeners," Psychol. Aging 17, 453-467.