ArticlePDF Available

The roles of fundamental frequency contours and sentence context in Mandarin Chinese speech intelligibility

Authors:

Abstract and Figures

Flattening the fundamental frequency (F0) contours of Mandarin Chinese sentences reduces their intelligibility in noise but not in quiet. It is unclear, however, how the absence of primary acoustic cue for lexical tones might be compensated with the top-down information of sentence context. In this study, speech intelligibility was evaluated when participants listened to sentences and word lists with or without F0 variations in quiet and noise. The results showed that sentence context partially explained the unchanged intelligibility of monotonous Chinese sentences in quiet and further indicate that F0 variations and sentence context act in concert during speech comprehension.
Content may be subject to copyright.
The roles of fundamental frequency contours and
sentence context in Mandarin Chinese speech
intelligibility
Jiuju Wang and Hua Shu
State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University,
Beijing 100875, China
wangjiuju@gmail.com, shuhua.bnu@gmail.com
Linjun Zhang
a)
College of Chinese Studies, Beijing Language and Culture University, Beijing 100083,
China
Zhanglinjun75@gmail.com
Zhaoxing Liu
Beijing Chinese Language and Culture College, Beijing 100037, China
liuzhaoxing@bjhwxy.com
Yang Zhang
Department of Speech-Language-Hearing Sciences and Center for Neurobehavioral
Development, University of Minnesota, Minneapolis, Minnesota 55455
zhang470@umn.edu
Abstract: Flattening the fundamental frequency (F
0
) contours of
Mandarin Chinese sentences reduces their intelligibility in noise but not
in quiet. It is unclear, however, how the absence of primary acoustic cue
for lexical tones might be compensated with the top-down information
of sentence context. In this study, speech intelligibility was evaluated
when participants listened to sentences and word lists with or without
F
0
variations in quiet and noise. The results showed that sentence con-
text partially explained the unchanged intelligibility of monotonous
Chinese sentences in quiet and further indica te that F
0
variations and
sentence context act in concert during speech comprehension.
V
C
2013 Acoustical Society of America
PACS numbers: 43.71.Gv [AC]
Date Received: January 9, 2013 Date Accepted: May 28, 2013
1. Introduction
Speech perception includes both top-down and bottom-up processes, and phonemic as
well as lexical identification is heavily influenced by context in degraded conditions
(Hickok and Poeppel, 2007). Although F
0
contour, the primary prosodic feature, does
not determine segmental consonant or vowel identity by itself, it has a variety of lin-
guistic functions, including marking emphasis and phrase boundaries. Previous studies
examining the effects of F
0
contours on the intelligibility of English sentences produced
by healthy (Wingfield et al., 1984; Laures and Weismer, 1999) or hearing-impaired
speakers (Maassen and Povel, 1984) have found that the absence of F
0
variation
decreases the intelligibility of speech compared with normal F
0
variation.
In tonal languages like Mandarin Chinese, lexical tone contrasts are phonolog-
ically as important as phonemic contrasts. That is, the F
0
contours for Chinese lexical
tones distinguish lexical meanings from otherwise identical strings of phonemes.
a)
Author to whom correspondence should be addressed.
J. Acoust. Soc. Am. 134 (1), July 2013
V
C
2013 Acoustical Society of America EL91
Wang et al.: JASA Express Letters [http://dx.doi.org/10.1121/1.4811159] Published Online 14 June 2013
Author's complimentary copy
However, recent studies with Mandarin Chinese materials surprisingly showed that
monotonous sentences with flattened F
0
contours were just as intelligible as normal
sentences with natural F
0
patterns in a quiet listening condition (Patel et al., 2010; Xu
et al., 2013). Because the speech materials in these studies used only normal sentences,
it is impossible to determine what cues (e.g., the remaining secondary acoustic cues for
lexical tones or higher-level semantic information based on the sentential context) are
utilized to comprehend the speech when the normal pitch patterns are flattened.
The first motivation for the present study was to examine the role of sentence
context in Mandarin Chinese intelligibility in a quiet listening cond ition. We manipu-
lated the top-down information variable and obtained intell igibility scores using both
normal and word list sentences with natural or flat F
0
contours. If high-level semantic
information plays an important role, the intelligibility of pitch-flattened word list
“sentences” would be substantially lower than that of the normal sentence counterparts
because the word list sentences are designed to be semantically incoherent to remove
the top-down contextual information. On the contrary, if low-level acoustic informa-
tion is the major determinant for speech perception in quiet, the intelligibility of word
list sentences with flattened F
0
contour would be comparable to that of the normal
sentences.
For speech-in-noise listening conditions, previous studies have consistently
demonstrated the importance of dynamic F
0
contours to speech intelligibility regardless
of whether the target language is a tonal language or not (Laures and Bunton, 2003;
Binns and Culling, 2007; Patel et al., 2010; Miller et al., 2010). The results show that
naturally varying F
0
contours improve speech intelligibility in background noise com-
pared with flat or inverted F
0
contours. The explanations proposed for these findings
maintain that dynamic changes in F
0
direct the listener’s attention to the content words
of the utterance and assist with the segmentation of words in continuous speech.
Unchanging F
0
lowers intelligibility of the utterance because it reduces the contrast
between words and makes it more difficult to parse continuous speech into meaningful
units (Laures and Bunton, 2003; Binns and Culling, 2007). As with the previous studies
that examined the intelligibility of speech in quiet, the speech-in-noise investigations
did not look into the contribution that semantic information provided by sentence con-
text might make to speech intelligibility.
The objective of the present study was to investigate the role of sentence context
and the interaction between sentence context and F
0
contours during Mandarin Chinese
speech comprehension in both quiet and noise. We hypothesized that sentence context
would contribute to the intelligibility of Chinese speech in all conditions, which partially
accounts for the unimpaired intelligibility of pitch flattened sentences in quiet. Furthermore,
different signal-to-noise ratios (SNRs) associated with the listening conditions (in quiet vs in
noise) might modulate the interaction between sentence context and F
0
contours.
2. Methods
2.1 Subjects
One hundred and fifty-six undergraduate participants from Beijing Normal University
were recruited. Five participants were omitted from the final analysis—three reported a
hearing disorder in subject screening, and the other two were omitted due to computer
error during data collection. The remaining 151 participants were all native Chinese
speakers between the ages of 18 and 23, and all had hearing sensitivity 20 dB hearing
level for octave frequencies between 250 and 8000 Hz bilaterally. In order to avoid the
stimulus order effects, we adopted a mixed-design with F
0
contours (normal vs flat)
and background noise (quiet, SNR ¼þ5 dB and 5 dB) as between-subject factors and
sentence context as a within-subject factor. The subject distribution in the between-
subject conditions was as follows: Quiet/normal F
0
condition, number of subjects
(n) ¼ 26; quiet/flat F
0
condition, n ¼ 25; SNR ¼þ5 dB/normal F
0
condition, n ¼ 25;
Wang et al.: JASA Express Letters [http://dx.doi.org/10.1121/1.4811159] Published Online 14 June 2013
EL92 J. Acoust. Soc. Am. 134 (1), July 2013 Wang et al.: Chinese speech intelligibility
Author's complimentary copy
SNR ¼þ5 dB/flat F
0
condition, n ¼ 24; SNR ¼5 dB/normal F
0
condition, n ¼ 25;
SNR ¼5 dB/flat F
0
condition, n ¼ 26.
2.2 Materials
To manipulate the F
0
and sentence context effects, four types of target sentences were
created: Norm al sentences and word list sentences with naturally intonated or unnatur-
ally monotonous contours. The normal sentences were 20 declarative Chines e sentences
with a variety of topics, and each sentence was comprised of 5 to 9 words. Words
from the entire pool of the normal sentences were pseudo-randomly selected to form
the word list sentences, which were syntactically anomalous and semantically meaning-
less at the whole sentence level. They were matched in length (number of syllables)
with the normal sentences. The normal sentences and word list sentences were read by
a male native speaker of Chinese. Manipulation of F
0
was done using Praat (Institute
of Phonetic Sciences, University of Amsterdam; downloadable at www.praat.org). A
flat F
0
contour was created for each sentence at the sentence’s mean F
0
and the result-
ing monotonous sentence was resynthesized using the PSOLA method (Fig. 1).
Consonant-misplaced sentences were used as masker stimuli. These sentences
were constructed by replac ing the onset consonant of each syllable in the normal sen-
tences with another consonant, provided that the replacement did not violate the pho-
notactic rules of Chinese. These consonant-misplaced sentences were syntactically
anomalous, unintelligible at both lexical and sentential levels. The masker sentences
were read by a female native speaker of Chinese. The choice of a male target speaker
and a female masking speaker was to enable the clear instruction “listen to the male
speaker” to be used throughout, rather than to train the subjects on the identity of the
target speaker (Scott et al., 2004).
Each target sentence was combined with masker noise at 2 SNR levels (þ5
and 5 dB). The level of the target sentences was fixed at 70 dB sound pressure level
and the level of the competing speech masker varied around the level of the target
speech. The masker speech was edited to be, on average, 1 s longer than the target
speech (500 ms prior to the beginning and 500 ms at the end of the target sentence) so
that no part of the speech target was unmasked.
Fig. 1. Acoustic features of sample speech stimuli. Broadband spectrograms (SPG: 0 to 5 kHz), intensity enve-
lopes (INT: 50 to 100 dB), and fundamental frequency contours (F
0
: 0 to 500 Hz) are displayed for (A) normal
sentence and its pitch-flattened counterpart; (B) word list sentence and its pitch-flattened counterpart.
Wang et al.: JASA Express Letters [http://dx.doi.org/10.1121/1.4811159] Published Online 14 June 2013
J. Acoust. Soc. Am. 134 (1), July 2013 Wang et al.: Chinese speech intelligibility EL93
Author's complimentary copy
2.3 Procedure
Listeners were tested individually in a sound-attenuated booth facing a computer mon-
itor. Stimuli were presented via loudspeakers (Edifier R18, Edifier Technology Co.
Ltd., Beijing, China). Because sentence context was the only within-subject factor,
each listener was presented with a total of 40 trials—20 normal sentences and 20 word
list sentences with natural or flat F
0
contours in one background condition (quiet or
one SNR level). Listeners were instructed that they would be listening to sentences in
quiet or noise, and were asked to write down the words read by the male speaker. The
task was self-paced; listeners pressed a key to advance from trial to trial. Each sentence
could be heard only once. The first author scored the written responses. Incorrect or
omitted words were annotated, and performance scores were checked by an independ-
ent auditor blind to the experiment. Practice sentences were provided before the experi-
ment, sampling all conditions. After the practice block, the experimenter checked the
readability of the participant’s handwriting.
3. Results
Intelligibility was determined by a keyword-correct count (Scott et al., 2004). The num-
ber of correct keywords (content words, varied across sentences from 4 to 7) identified
by each listener was counted and then converted to the percentage of the total number
of words and averaged across listeners. A 2 2 3 repeated measures analysis of var-
iance (ANOVA), with sentence context as the withi n-subject factor and F
0
contours
and background noise as the between-subject factors, was carried out. Results showed
that the three main effects were all highly significant [sentence context: F(1,
149) ¼ 414.06, p < 0.001, g
2
¼ 0.741; F
0
contours: F(1, 149) ¼ 203.48, p < 0.001,
g
2
¼ 0.584; background noise: F(2, 148) ¼ 377.53, p < 0.001, g
2
¼ 0.839], revealing that
intelligibility was reduced by the lack of sentence context and natural contours, and in
the presence of background noise (Fig. 2).
Further analyses comparing all possible 2- and 3-way ANOVAs showed that
the interactions were all significant [p values < 0.001, g
2
0.130 for all the 2-way inter-
actions and p ¼ 0.002, g
2
¼ 0.08 for the 3-way interaction]. The significant interactions
revealed that intelligibility disproportionately decreased with increasing noise for flat-
contour sentences versus natural-contour sentences as well as for word list sentences
versus normal sentences. Furthermore, intelligibility degraded to a greater extent for
word list sentences than normal sentences when the F
0
patterns of the two types of sen-
tences changed from natural contours to flat contours (Fig. 3). Bonferroni adjusted
post hoc pairwise comparisons revealed that intelligibility was significantly different for
most contrasting pairs of interest. The four excepti onal pairs included normal sentences
with natural contours versus normal sentences with flat contours in quiet, normal sen-
tences with natural contours in quiet versus their counterparts in 5 dB SNR back-
ground noise, word list sentences with nature contours versus normal sentences with
flat contours both in quiet and 5 dB SNR background noise.
4. Discussion
The present study aimed to investigate the roles of sentence context, F
0
contour, back-
ground noise, and the interactions among the factors in the intelligibility of Mandarin
Chinese speech. The significant main effects indicate that all three factors contribute to
the intelligibility of Chinese speech. That is, lack of sentence context and natural F
0
variations, and the presence of background noise all lead to a decrease in intelligibility,
respectively. The significant interactions indicate that the contributions that sentence
context and F
0
contours make to the intelligibility of Chinese speech are modulated by
the background noise conditions. Specifically, when presented in quiet, normal senten-
ces with flat F
0
contours are as intelligible as their counterparts with natural F
0
con-
tours. However, when presented in noise, flattening the F
0
contours of normal senten-
ces dramatically reduced the intelligibility compared with the mild decrease in
Wang et al.: JASA Express Letters [http://dx.doi.org/10.1121/1.4811159] Published Online 14 June 2013
EL94 J. Acoust. Soc. Am. 134 (1), July 2013 Wang et al.: Chinese speech intelligibility
Author's complimentary copy
intelligibility for normal sentences with natural F
0
contours. These results are consist-
ent with previous findings (Patel et al., 2010; Xu et al., 2013), highlighting the impor-
tance of natural F
0
contours for sentence intelligibility in noise. At the same time,
word list sentences with natural F
0
contours were less intelligible than their normal sen-
tence counterparts whether presented in quiet or noise, highlighting the contribution of
sentence context to speech intelligibility irrespective of the background conditions.
One issue of particular interest to the present study is to what extent the unim-
paired intelligibility of sentences with flat F
0
contours in quiet is attributed to the
remaining phonetic cues or semantic information from the rest of the sentence.
Although F
0
contour is the primary acoustic cue for lexical tone perception, other cues
(e.g., duration and amplitude) are also utilized by native Chinese speakers to recognize
the types of tones (Wise and Chong, 1957; Liu and Samuel, 2004). It is also well-
documented that semantic information facilitates spoken word recognition and speech
comprehension (McClelland and Elman, 1986; Liu and Samuel, 2007). Because only
two types of speech materials, i.e., normal sentences with and without natural F
0
Fig. 2. Word-report scores sorted by the main effects of factors. Error bars represent standard deviation across
subjects.
Fig. 3. Word-report scores sorted by the interaction effects.
Wang et al.: JASA Express Letters [http://dx.doi.org/10.1121/1.4811159] Published Online 14 June 2013
J. Acoust. Soc. Am. 134 (1), July 2013 Wang et al.: Chinese speech intelligibility EL95
Author's complimentary copy
contours, were used in the previous studies (Patel et al., 2010; Xu et al., 2013), it was
difficult to separate the effects of high-level semantic and low-level acoustic informa-
tion on intelligibility. In the present study, the result that absence of sentence context
reduced the intelligibility indicates that sentence context partially accounts for the
unimpaired intelligibility of monotonou s Chinese sentences in quiet. Furthermore,
when both sentence context and natural F
0
variations were deprived, the speech mate-
rial was least intelligible whether in quiet or noise background conditions. This finding
indicates that F
0
variations and sentence context act in concert during Chinese speech
comprehension. When word list sentences with natural F
0
contours were directly com-
pared with normal sentences with flat F
0
contours, their intelligibility was similar in
the quiet and 5 dB SNR noise conditions, whereas the intelligibility of word list senten-
ces with natural F
0
contours was slightly higher than the normal sentences with flat F
0
contours in the 5 dB SNR noise condition. This result indicates that the relative im-
portance of sentence context and F
0
contours for speech intelligibility depends on back-
ground conditions.
The test materials, especially the word list sentences and sentences with flat F
0
contours were specially designed for this experiment. The lack of ecological validity of
the materials may limit the explanation of our results. However, in a recent study,
Feng et al. (2012) found that although lexical tone recognition for sine-wave Chinese
monosyllabic words is poor, the recognition accuracy for sine-wave sentences is very
high, reflecting the compensation effect of contextual information when the tonal infor-
mation is poor. Our results, together with the findings of Feng et al. (2012) , indicate
that the functional load of lexical tones on sentence comprehension is limited. That is,
lexical meaning acces s is possible in a sentence context when surface pitch patterns of
tones are altered, although lexical tones are as important as segmental phonemes in
specifying the meaning of a word in a tone language.
In a recent fMRI study, Xu et al. (2013) found that Mandarin Chinese senten-
ces with natural or flat F
0
contours elicited similar activation in the lexical-semantic
processing areas (e.g., the left insular, middle, and inferior temporal gyri) and at the
same time, monotonous sentences elicited greater activation in the left planum tempo-
rale than sentences with natural F
0
contours. These results demonstrate that lexical
meaning can still be accessed in pitch-flattened Chinese sentences, and that this process
is realized by automatic recovery of the phonological representations of lexical tones
from the altered tonal patterns. Our results are consistent with the findings of Xu et al.
(2013), supporting the models which include an important role for top-down informa-
tion in guiding speech perception (e.g., Hickok and Poeppel, 2007), given that listeners
can automatically use additional neural and cognitive resour ces to recover distorted
tonal patterns in sentences. As there are significant interactions among the three factors
(i.e., sentence context, F
0
variation, and background noise) in the intelligibility of
Chinese speech, further studies are needed to better understand the brain mechanisms
involved in these cognitive processes.
Acknowledgments
The research was supported by grants from the Humanities and Social Sciences
Foundation (Projects for Young Scholars) of the Chinese Ministry of Education (Grant
No. 10YJCZH223) to L.J.Z., and from the Natural Science Foundation of China (Grant
No. 31271082), the Natural Science Foundation of Beijing (Grant No. 7132119), and the
Fundamental Research Fund for the Central Universities to H.S.
References and links
Binns, C., and Culling, J. F. (2007). “The role of fundamental frequency contours in the perception of
speech against interfering speech,” J. Acoust. Soc. Am. 122(3), 1765–1776.
Feng, Y. M., Xu, L., Zhou, N., Yang, G., and Yin, S. K. (2012). “Sine-wave speech recognition in a tonal
language,” J. Acoust. Soc. Am. 131(2), EL133–EL138.
Wang et al.: JASA Express Letters [http://dx.doi.org/10.1121/1.4811159] Published Online 14 June 2013
EL96 J. Acoust. Soc. Am. 134 (1), July 2013 Wang et al.: Chinese speech intelligibility
Author's complimentary copy
Hickok, G., and Poeppel, D. (2007). “The cortical organization of speech processing,” Nat. Rev. Neurosci.
8(5), 393–402.
Laures, J. S., and Bunton, K. (2003). “Perceptual effects of a flattened fundamental frequency at the
sentence level under different listening conditions,” J. Commun. Disord. 36(6), 449–464.
Laures, J. S., and Weismer, G. (1999). “The effects of a flattened fundamental frequency on intelligibility
at the sentence level,” J. Speech Lang. Hear. Res. 42(5), 1148–1156.
Liu, S., and Samuel, A. G. (2004). “Perception of Mandarin lexical tones when F
0
information is
neutralized,” Lang Speech 47(2), 109–138.
Liu, S., and Samuel, A. G. (2007). “The role of Mandarin lexical tones in lexical access under different
contextual conditions,” Lang. Cognit. Processes 22(4), 566–594.
Maassen, B., and Povel, D. (1984). “The effect of correcting fundamental frequency on the intelligibility of
deaf speech and its interaction with temporal aspects,” J. Acoust. Soc. Am. 76, 1673–1681.
McClelland, J. L., and Elman, J. (1986). “The TRACE model of speech perception,” Cogn. Psychol. 18(1),
1–86.
Miller, S. E., Schlauch, R. S., and Watson, P. J. (2010). “The effects of fundamental frequency contour
manipulations on speech intelligibility in background noise,” J. Acoust. Soc. Am. 128(1), 435–443.
Patel, A. D., Xu, Y., and Wang, B. (2010). “The role of F
0
variation in the intelligibility of Mandarin
sentences,” in Proceedings of Speech Prosody 2010 (Chicago, IL).
Scott, S. K., Rosen, S., Wickham, L., and Wise, R. J. (2004). “A positron emission tomography study of
the neural basis of informational and energetic masking effects in speech perception,” J. Acoust. Soc. Am.
115(2), 813–821.
Wingfield, A., Lombardi, L., and Sokol, S. (1984). “Prosodic features and the intelligibility of accelerated
speech: Syntactic versus periodic segmentation,” J. Speech Hear. Res. 27, 128–134.
Wise, C. M., and Chong, L. P.-H. (1957). “Intelligibility of whispering in a tone language,” J. Speech
Hear. Disord. 22(3), 335–338.
Xu, G., Zhang, L., Shu, H., Wang, X., and Li, P. (2013). “Access to lexical meaning in pitch-flattened
Chinese sentences: An fMRI study,” Neuropsychologia 51(3), 550–556.
Wang et al.: JASA Express Letters [http://dx.doi.org/10.1121/1.4811159] Published Online 14 June 2013
J. Acoust. Soc. Am. 134 (1), July 2013 Wang et al.: Chinese speech intelligibility EL97
Author's complimentary copy
... There are four lexical tones in Mandarin according to pitch pattern: high level (Tone 1), rising (Tone 2), low dipping (Tone 3), and falling (Tone 4) [1]. Patel Wang et al. [3] examined the importance of Mandarin lexical tones by flattening the fundamental frequency (F0) contours of Mandarin sentences. They found that the role of F0 information in Mandarin sentence recognition was important in noise but redundant in quiet for adults with normal hearing (NH). ...
... They found that the role of F0 information in Mandarin sentence recognition was important in noise but redundant in quiet for adults with normal hearing (NH). However, although Patel et al. [2] and Wang et al. [3] altered the primary cues (i.e., F0 information) that provide lexical tone information, their participants may have been able to utilize secondary cues (e.g., amplitude envelope) for lexical tone perception, as amplitude envelope correlates with F0 contours [4,5]. To address this issue, Chen et al. [6] used a text-tospeech (TTS) engine to flatten the lexical tone of each word in a sentence (i.e., Tone 1, high level). ...
... Their results showed that listeners with NH almost perfectly perceived the synthesized flat-tone sentences in quiet, but their sentence recognition declined substantially in speech-shaped noise (SSN) at 0 dB signal-to-noise ratio (SNR) (i.e., presentation level of the targeted sentence minus that of the noise) [6]. This was consistent with the results of Patel et al. [2] and Wang et al. [3], although different tone manipulation methods were used in these studies. ...
Preprint
Full-text available
In Chinese languages, tones are used to express the lexical meaning of words. It is therefore important to analyze the role of lexical tone in Chinese sentence recognition accuracy. There is a lack of research on the role of Cantonese lexical tones in sentence recognition accuracy. Therefore, this study examined the contribution of lexical tone information to Can-tonese sentence recognition accuracy and its cognitive correlates in adults with normal hearing (NH). A text-to-speech synthesis engine was used to synthesize Cantonese daily-use sentences with each word carrying an original or a flat lexical tone, which were then presented to 97 participants in quiet, in speech-shaped noise (SSN), and in two-talker babble (TTB) noise conditions. Both target sentences and noises were presented at 65 dB binau-rally via insert headphones. It was found that listeners with NH can almost perfectly recognize a daily-use Cantonese sentence with mismatched lexical tone information in quiet, while their sentence recognition decreases substantially in noise. The same finding was reported for Mandarin, which has a relatively simple tonal system, suggesting that the current results may be applicable to other tonal languages. In addition, working memory (WM) was significantly related to decline in sentence recognition score in the TTB but not in the SSN, when the lexical tones were mismatched. This finding can be explained using the Ease of Language Understanding model and suggests that those with higher WM are less likely to be affected by the degraded lexical information for perceiving daily-use sentences in the TTB.
... There are four lexical tones in Mandarin according to pitch pattern: high level (Tone 1), rising (Tone 2), low dipping (Tone 3), and falling (Tone 4) [1]. Patel Wang et al. [3] examined the importance of Mandarin lexical tones by flattening the fundamental frequency (F0) contours of Mandarin sentences. They found that the role of F0 information in Mandarin sentence recognition was important in noise but redundant in quiet for adults with normal hearing (NH). ...
... They found that the role of F0 information in Mandarin sentence recognition was important in noise but redundant in quiet for adults with normal hearing (NH). However, although Patel et al. [2] and Wang et al. [3] altered the primary cues (i.e., F0 information) that provide lexical tone information, their participants may have been able to utilize secondary cues (e.g., amplitude envelope) for lexical tone perception, as amplitude envelope correlates with F0 contours [4,5]. To address this issue, Chen et al. [6] used a text-tospeech (TTS) engine to flatten the lexical tone of each word in a sentence (i.e., Tone 1, high level). ...
... Their results showed that listeners with NH almost perfectly perceived the synthesized flat-tone sentences in quiet, but their sentence recognition declined substantially in speech-shaped noise (SSN) at 0 dB signal-to-noise ratio (SNR) (i.e., presentation level of the targeted sentence minus that of the noise) [6]. This was consistent with the results of Patel et al. [2] and Wang et al. [3], although different tone manipulation methods were used in these studies. ...
Article
Full-text available
In Chinese languages, tones are used to express the lexical meaning of words. It is therefore important to analyze the role of lexical tone in Chinese sentence recognition accuracy. There is a lack of research on the role of Cantonese lexical tones in sentence recognition accuracy. Therefore, this study examined the contribution of lexical tone information to Cantonese sentence recognition accuracy and its cognitive correlates in adults with normal hearing (NH). A text-to-speech synthesis engine was used to synthesize Cantonese daily-use sentences with each word carrying an original or a flat lexical tone, which were then presented to 97 participants in quiet, in speech-shaped noise (SSN), and in two-talker babble (TTB) noise conditions. Both target sentences and noises were presented at 65 dB binaurally via insert headphones. It was found that listeners with NH can almost perfectly recognize a daily-use Cantonese sentence with mismatched lexical tone information in quiet, while their sentence recognition decreases substantially in noise. The same finding was reported for Mandarin, which has a relatively simple tonal system, suggesting that the current results may be applicable to other tonal languages. In addition, working memory (WM) was significantly related to decline in sentence recognition score in the TTB but not in the SSN, when the lexical tones were mismatched. This finding can be explained using the Ease of Language Understanding model and suggests that those with higher WM are less likely to be affected by the degraded lexical information for perceiving daily-use sentences in the TTB.
... Third, even in the absence of lexical tone recognition, cues afforded by F0 variation may directly assist speech intelligibility under noisy conditions by focusing the attention of listeners on contextual words and aiding the parsing of continuous speech into meaningful units [30,[61][62][63][64]. Mandarin sentence intelligibility decreased if the F0 contours were flattened under noisy conditions, but not in clear conditions [32,61]. ...
... Third, even in the absence of lexical tone recognition, cues afforded by F0 variation may directly assist speech intelligibility under noisy conditions by focusing the attention of listeners on contextual words and aiding the parsing of continuous speech into meaningful units [30,[61][62][63][64]. Mandarin sentence intelligibility decreased if the F0 contours were flattened under noisy conditions, but not in clear conditions [32,61]. The significance of the dynamic F0 contours in terms of speech intelligibility also applies to non-tonal languages. ...
... However, more weights were placed in Region 1 under SAM white noise compared to SAM SSN, indicating that the spectral shape of the noise would impact the frequency-weighting functions of temporal envelope for Mandarin perception. Another reason was that F0 contour might be of higher importance to Mandarin sentence recognition in SAM white noise than in other listening environments [61,69]. ...
Article
Full-text available
Background Temporal envelope cues are conveyed by cochlear implants (CIs) to hearing loss patients to restore hearing. Although CIs could enable users to communicate in clear listening environments, noisy environments still pose a problem. To improve speech-processing strategies used in Chinese CIs, we explored the relative contributions made by the temporal envelope in various frequency regions, as relevant to Mandarin sentence recognition in noise. Methods Original speech material from the Mandarin version of the Hearing in Noise Test (MHINT) was mixed with speech-shaped noise (SSN), sinusoidally amplitude-modulated speech-shaped noise (SAM SSN), and sinusoidally amplitude-modulated (SAM) white noise (4 Hz) at a + 5 dB signal-to-noise ratio, respectively. Envelope information of the noise-corrupted speech material was extracted from 30 contiguous bands that were allocated to five frequency regions. The intelligibility of the noise-corrupted speech material (temporal cues from one or two regions were removed) was measured to estimate the relative weights of temporal envelope cues from the five frequency regions. Results In SSN, the mean weights of Regions 1–5 were 0.34, 0.19, 0.20, 0.16, and 0.11, respectively; in SAM SSN, the mean weights of Regions 1–5 were 0.34, 0.17, 0.24, 0.14, and 0.11, respectively; and in SAM white noise, the mean weights of Regions 1–5 were 0.46, 0.24, 0.22, 0.06, and 0.02, respectively. Conclusions The results suggest that the temporal envelope in the low-frequency region transmits the greatest amount of information in terms of Mandarin sentence recognition for three types of noise, which differed from the perception strategy employed in clear listening environments.
... Specifically, the presence of various types of interference such as broadband noise and one-/multi-talker babbles deteriorates speech intelligibility dramatically, but listeners are able to use semantic context to offset the detrimental effects to a great extent [11][12][13][14]. Similarly, semantic context is also used by listeners to aid speech recognition and comprehension when the speech signal itself is degraded [15,16]. Furthermore, when a degraded speech signal is presented against interference, listeners benefit even more from semantic context. ...
... Furthermore, when a degraded speech signal is presented against interference, listeners benefit even more from semantic context. For example, the intelligibility difference between acoustically degraded sentences and semantically unrelated words is much greater when they are presented in suboptimal listening backgrounds than in quiet, indicating that listeners rely more on the top-down semantic context to aid speech recognition and comprehension in adverse conditions [16][17][18]. ...
... Specifically, to assess the role of contextual semantic integration in speech recognition, we introduced acoustic manipulations in two speech-in-noise conditions, one with natural F 0 contours kept in the target sentences presented against interfering background speech, and the other with flattened F 0 contours that disrupted the critical cue for Chinese lexical tones for proper word recognition. This speech-in-noise test protocol with Mandarin Chinese materials was previously adopted in a number of studies on elementary school students, middle school students, and adults, including the elderly population [16,18,25], and the results demonstrate that greater auditory semantic integration at the sentence level is required to recognize the words in the F 0 -degraded condition. Furthermore, two statistical models, i.e., the multiplicative model (product of the two subskills) and additive model (sum of the two subskills), were tested to clarify how the subskills would predict the variance in reading comprehension. ...
Article
Full-text available
Theories of reading comprehension emphasize decoding and listening comprehension as two essential components. The current study aimed to investigate how Chinese character decoding and context-driven auditory semantic integration contribute to reading comprehension in Chinese middle school students. Seventy-five middle school students were tested. Context-driven auditory semantic integration was assessed with speech-in-noise tests in which the fundamental frequency (F0) contours of spoken sentences were either kept natural or acoustically flattened, with the latter requiring a higher degree of contextual information. Statistical modeling with hierarchical regression was conducted to examine the contributions of Chinese character decoding and context-driven auditory semantic integration to reading comprehension. Performance in Chinese character decoding and auditory semantic integration scores with the flattened (but not natural) F0 sentences significantly predicted reading comprehension. Furthermore, the contributions of these two factors to reading comprehension were better fitted with an additive model instead of a multiplicative model. These findings indicate that reading comprehension in middle schoolers is associated with not only character decoding but also the listening ability to make better use of the sentential context for semantic integration in a severely degraded speech-in-noise condition. The results add to our better understanding of the multi-faceted reading comprehension in children. Future research could further address the age-dependent development and maturation of reading skills by examining and controlling other important cognitive variables, and apply neuroimaging techniques such as functional magmatic resonance imaging and electrophysiology to reveal the neural substrates and neural oscillatory patterns for the contribution of auditory semantic integration and the observed additive model to reading comprehension.
... Specifically, the presence of various types of interference such as broadband noise and one-/multi-talker babbles deteriorates speech intelligibility dramatically, but listeners are able to use semantic context to offset the detrimental effects to a great extent (Dubno et al., 2000;Scott, Rosen, Wickham, & Wise, 2004;Golestani, Rosen, & Scott, 2009;Calandruccio, Dhar, & Bradlow, 2010). Similarly, semantic context is also used by listeners to aid speech recognition and comprehension when the speech signal itself is degraded (e.g., Patel, Xu, & Wang, 2010;Wang, Shu, Zhang, Liu, & Zhang, 2013). Furthermore, when degraded speech signal is presented against interference, listeners benefit even more from semantic context. ...
... Furthermore, when degraded speech signal is presented against interference, listeners benefit even more from semantic context. For example, the intelligibility difference between acoustically degraded sentences and semantically unrelated words is much greater when they are presented in suboptimal listening backgrounds than in quiet, indicating that listeners rely more on the top-down semantic context to aid speech recognition and comprehension in adverse conditions (Binns & Culling, 2007;Wang et al., 2013;Jiang, Li, Shu, Zhang, & Zhang, 2017). ...
... Specifically, to assess the role of contextual semantic integration in speech recognition, we introduced acoustic manipulations in two speech-in-noise conditions, one with natural F0 contours kept in the target sentences presented against interfering background speech and the other with flattened F0 contour that disrupted the critical cue for Chinese lexical tones for proper word recognition. This speech-in-noise test protocol with Mandarin Chinese materials had been previously adopted in a number of studies on elementary school students, middle school students, and adults including the elderly population (Wang et al., 2013;Jiang et al., 2017;Zhou et al, 2017), and the results demonstrated that greater auditory semantic integration at the sentence level is required to recognize the words in the F0-degraded condition. Furthermore, two statistical models, i.e., the multiplicative model (product of the two subskills) and additive model (sum of the two subskills), were tested to clarify how the subskills would predict the variance in reading comprehension. ...
Preprint
Full-text available
Theories of reading comprehension emphasize decoding and listening comprehension as two essential components. The current study aimed to investigate how Chinese character decoding and context-driven auditory semantic integration contribute to reading comprehension in Chinese middle school students. Seventy-five middle school students were tested. Context-driven auditory semantic integration was assessed with speech-in-noise tests in which the fundamental frequency (F0) contours of spoken sentences were either kept natural or acoustically flattened with the latter requiring a higher degree of contextual information. Statistical modelling with hierarchical regression was conducted to examine the contributions of Chinese character decoding and context-driven auditory semantic integration to reading comprehension. Performance on Chinese character decoding and auditory semantic integration scores with the flattened (but not natural) F0 sentences significantly predicted reading comprehension. Furthermore, the contributions of these two factors to reading comprehension were better fitted with an additive model instead of a multiplicative model. These findings indicate that reading comprehension in middle schoolers is associated with not only character decoding but also the listening ability to make better use of the sentential context for semantic integration in a severely degraded speech-in-noise condition. The results add to our better understanding of the multi-faceted reading comprehension in children. Future research could further address the age-dependent development and maturation of reading skills by examining and controlling other important cognitive variables, and apply neuroimaging techniques such as functional magmatic resonance imaging to reveal the neural substrates for the contribution of auditory semantic integration and the observed additive model to reading comprehension.
... As for Chinese Mandarin (Throughout the paper, the term Chinese will be used to specifically refer to Mandarin Chinese), some pilot studies investigated the relationship between linguistic features and comprehensibility. Segment features were proven to contribute more to comprehensibility compared to intonation and rhythm [17,18], while lexical tone was shown to have no significant inhibitory effect on comprehension in quiet environments [19,20]. One of the shortcomings of these studies is that the research materials were mostly synthesized speech and the content was repetitive, which might affect comprehensibility judgement. ...
... Tone accuracy has a relatively strong correlation with comprehensibility, indicating that lexical tone has a strong impact on Chinese L2 comprehensibility rating. Previous research has shown that missing tone information does not impede understanding with the complete context in quiet [19,20], but tone features came out as the strongest correlation in the current study. The current result does not conflict with prior conclusions, because our result shows that tone has a strong inhibitory effect on comprehensibility when context information is not complete. ...
Conference Paper
Full-text available
The present study set out to investigate the crucial phonological and fluency aspects that influence listeners’ judgement of Chinese L2 comprehensibility. 180 speech samples elicited from 20 Urdu-speaking learners of Chinese were subjectively rated by native speakers of Chinese for comprehensibility scores, and then objectively analyzed in terms of phonological (segment error ratio, high & low FL error ratio, FL of segment substitutions, tone error ratio, and average maximum posterior possibility) and fluency features (articulation rate, pause ratio, FL of pause errors, and sentence length). The results showed that comprehensibility was significantly related to most of the features, and tone error ratio has the strongest correlation with comprehensibility judgements (r = −.588). Furthermore, multiple lin- ear regression analyses revealed that these features have a com- bined contribution to the comprehensibility (r2 = .6286). This study offers: (1) an empirical evidence for comprehensibility-related features of Chinese L2 speech, (2) the adaptation of Brown’s FL principle on Chinese, (3) a strong proof that lexical tone errors greatly impair understanding when the segmental information is not sufficient.
... Another FL study based on mutual information of Chinese text and phonemes has found FLs of some tone contrasts are much larger than that of phoneme pairs [4]. In some other studies, lexical tone was shown to have no significant inhibitory effect on comprehension in quiet environments [5], [6]. Since understanding spoken language is often thought of as an interaction procedure between top-down information of linguistic context and bottom-up information of perceptual input [7], listeners can guess the content through segmental information. ...
Conference Paper
Full-text available
The missing information of lexical tone of Chinese has been proved to have no inhibitory effect on understanding in quiet environment in previous study. The current study set out to examine the importance of tone for speech comprehension when the contextual information is incomplete. The first experiment examined the correlation between second language (L2) speech comprehensibility and tone error ratio through 180 L2 speech samples with severe segmental and suprasegmental errors. The second experiment investigated the correlation between functional load of missing tone and word choices of sentences with tone-caused ambiguity pinyin sequences without tone information, and 30 native Mandarin speakers were asked to dictate the 14 ambiguous level-tone sentences. The results showed that tone error ratio strongly correlated with comprehensibility, and the word choice is also significantly correlated with the FL of the tones, suggesting that tone is not a redundant information in communication, and its ambiguity resolution function can be measured through FL model.
... Of course, context matters (J. Wang, Shu, Zhang, Liu, and Zhang 2013). When listeners can rely on context, they may be able to easily overcome some of the challenges that misleading tones (or foreign-accented speech) might otherwise present. ...
Chapter
Full-text available
This chapter discusses second language pronunciation of Mandarin from the perspective of the native Mandarin speakers who listen to it. For such listeners, second language Mandarin often bears a noticeable foreign accent. I will provide a framework for defining foreign accent and for distinguishing accented pronunciation from pronunciation errors. I will then review the results of research related to foreign-accented Mandarin and how it affects listeners’ judgments, comprehension, and the efficiency with which they process second language Mandarin speech. Naturally, lexical tones will receive special attention in this discussion.
... Many studies have focused their interest on the contribution of F0 contours in perceiving Mandarin Chinese (e.g., Kong and Zeng, 2006;Krenmayr et al, 2011;Lee et al, 2013;Li et al., 2019;Wang and Xu, 2020). Using meaningful sentences as targets, Patel et al., (2010) demonstrated that meaningful Chinese sentences with flattened F0 contours were as intelligible as those with normal F0 patterns in a quiet environment, however, it was less intelligible under SSN or babble noise, as reported in several other studies (e.g., Wang et al., 2013;Chen et al., 2014;Li et al., 2019). ...
Article
Full-text available
Speech perception is essential for daily communication. Background noise or concurrent talkers, on the other hand, can make it challenging for listeners to track the target speech (i.e., cocktail party problem). The present study reviews and compares existing findings on speech perception and unmasking in cocktail-party listening environments in English and Mandarin Chinese. The review starts with an introduction section followed by related concepts of auditory masking. The next two sections review factors that release speech perception from masking in English and Mandarin Chinese, respectively. The last section presents an overall summary of the findings with comparisons between the two languages. Future research directions with respect to the difference in literatures on the reviewed topic between the two languages are also discussed.
Chapter
Fundamental frequency (F0), listening environment, and semantic context are three important factors for both tonal and non-tonal language intelligibility by native speakers. However, it remains unclear how these factors affect second language (L2) learners of Mandarin Chinese and whether there are differences between native and L2 Mandarin speakers. Through speech re-synthesis and sentence counterbalancing, this study investigated the possible effects of F0 (i.e., natural F0 versus flattened F0) on the intelligibility of Mandarin speech by L2 Mandarin learners from different proficiency levels in quiet and white noise conditions when controlling for sentence context. A mixed-effect statistical model confirmed the main effects of F0 contour, listening environment, and proficiency level. That is to say, the lack of natural F0 contour, the presence of noise, and the lower proficiency level would predict the reduction in intelligibility when adjusting for the other two variables. However, no significant interactions were found. Specifically, the hypothesis that flattened sentences are as intelligible as natural sentences for more advanced learners was not supported due to the change of experimental subjects from native speakers to L2 speakers. It was proposed that compared to native speakers, L2 speakers’ underdeveloped utilization of secondary cues and semantic contexts, due to a developing proficiency level, may lead to non-significant interactions. The finding of the effect of F0 on intelligibility also illustrates the importance of tone accuracy and diversifying L2 learners’ linguistic input in Chinese pronunciation teaching and learning.
Article
Full-text available
The current study pursues Ye & Connine's (1999) suggestion that tonal information is much more important when words are presented in context, than in isolation. Disyllabic Mandarin words were either presented normally, or with changes in their segmental and/or tonal structure. Critically, these items were presented in isolation, in sentence context, and in idioms; previous studies have not examined these issues in sentential context. In Experiment 1, native Mandarin speakers made lexical decisions about these items. In Experiment 2, the critical stimuli were presented in white noise, and the listeners’ task was to detect the vowels and the tones of the stimuli. The results supported a more important role for tonal cues when the stimulus is presented in context than when it is in isolation; this pattern depended on the task conditions, as suggested by Soto-Faraco et al. (2001), and Mattys et al. (2005).
Article
Full-text available
This study tested the importance of F0 variation for tone language comprehension. The intelligibility of Mandarin sentences with natural F0 contours was compared to the intelligibility of monotone (flat-F0) sentences created via speech resynthesis. In a quiet background, flat-F0 speech was just as intelligible as natural speech (about 94% intelligible), highlighting the robustness of the language comprehension system. However, when babble noise was added (0 db SNR) flat-F0 speech was substantially less intelligible than natural speech (60% vs. 80% intelligible), indicating that F0 variation is very important for Mandarin sentence intelligibility in noise.
Article
Full-text available
It is hypothesized that in sine-wave replicas of natural speech, lexical tone recognition would be severely impaired due to the loss of F0 information, but the linguistic information at the sentence level could be retrieved even with limited tone information. Forty-one native Mandarin-Chinese-speaking listeners participated in the experiments. Results showed that sine-wave tone-recognition performance was on average only 32.7% correct. However, sine-wave sentence-recognition performance was very accurate, approximately 92% correct on average. Therefore the functional load of lexical tones on sentence recognition is limited, and the high-level recognition of sine-wave sentences is likely attributed to the perceptual organization that is influenced by top-down processes.
Article
Full-text available
Previous studies have documented that speech with flattened or inverted fundamental frequency (F0) contours is less intelligible than speech with natural variations in F0. The purpose of this present study was to further investigate how F0 manipulations affect speech intelligibility in background noise. Speech recognition in noise was measured for sentences having the following F0 contours: unmodified, flattened at the median, natural but exaggerated, inverted, and sinusoidally frequency modulated at rates of 2.5 and 5.0 Hz, rates shown to make vowels more perceptually salient in background noise. Five talkers produced 180 stimulus sentences, with 30 unique sentences per F0 contour condition. Flattening or exaggerating the F0 contour reduced key word recognition performance by 13% relative to the naturally produced speech. Inverting or sinusoidally frequency modulating the F0 contour reduced performance by 23% relative to typically produced speech. These results support the notion that linguistically incorrect or misleading cues have a greater deleterious effect on speech understanding than linguistically neutral cues.
Article
Chinese is a tonal language in which variation in pitch is used to distinguish word meanings. Thus, in order to understand a word, listeners have to extract the pitch patterns in addition to its phonemes. Can the correct word meaning still be accessed in sentence contexts if pitch patterns of words are altered? If so, how is this accomplished? The present study attempts to address such questions with event-related functional magnetic resonance imaging (fMRI). Native speakers of Mandarin Chinese listened to normal and pitch-flattened (monotone) speech inside the scanner. The behavioral results indicated that they rated monotone sentences as intelligible as normal sentences, and performed equally well in a dictation test on the two types of sentences. The fMRI results showed that both types of sentences elicited similar activation in the left insular, middle and inferior temporal gyri, but the monotone sentences elicited greater activation in the left planum temporale (PT) compared with normal sentences. These results demonstrate that lexical meaning can still be accessed in pitch-flattened Chinese sentences, and that this process is realized by automatic recovery of the phonological representations of lexical tones from the altered tonal patterns. Our findings suggest that the details of spoken pitch patterns are not essential for adequate lexical-semantic processing during sentence comprehension even in tonal languages like Mandarin Chinese, given that listeners can automatically use additional neural and cognitive resources to recover distorted tonal patterns in sentences.
Article
We describe a model called the TRACE model of speech perception. The model is based on the principles of interactive activation. Information processing takes place through the excitatory and inhibitory interactions of a large number of simple processing units, each working continuously to update its own activation on the basis of the activations of other units to which it is connected. The model is called the TRACE model because the network of units forms a dynamic processing structure called “the Trace,” which serves at once as the perceptual processing mechanism and as the system's working memory. The model is instantiated in two simulation programs. TRACE I, described in detail elsewhere, deals with short segments of real speech, and suggests a mechanism for coping with the fact that the cues to the identity of phonemes vary as a function of context. TRACE II, the focus of this article, simulates a large number of empirical findings on the perception of phonemes and words and on the interactions of phoneme and word perception. At the phoneme level, TRACE II simulates the influence of lexical information on the identification of phonemes and accounts for the fact that lexical effects are found under certain conditions but not others. The model also shows how knowledge of phonological constraints can be embodied in particular lexical items but can still be used to influence processing of novel, nonword utterances. The model also exhibits categorical perception and the ability to trade cues off against each other in phoneme identification. At the word level, the model captures the major positive feature of Marslen-Wilson's COHORT model of speech perception, in that it shows immediate sensitivity to information favoring one word or set of words over others. At the same time, it overcomes a difficulty with the COHORT model: it can recover from underspecification or mispronunciation of a word's beginning. TRACE II also uses lexical information to segment a stream of speech into a sequence of words and to find word beginnings and endings, and it simulates a number of recent findings related to these points. The TRACE model has some limitations, but we believe it is a step toward a psychologically and computationally adequate model of the process of speech perception.
Article
This study investigates the role of intonation for the intelligibility of deaf speech. The intonation contours of Dutch sentences spoken by deaf children were manipulated using digital signal processing techniques, including LPC analysis. Sentence intonation was corrected by replacing the original F0 contour of the deaf utterance with an artificial contour derived from a formalized intonation grammar. Three types of intonation corrections were produced, differing with respect to the underlying accent structure and the type of F0 movements used. The overall results show that intonation correction yields a small but significant improvement in intelligibility of 7% (from 20% to 27% words correctly identified). The largest gain is obtained after removal of over-accentuations. To evaluate the interaction with temporal aspects, intonation corrections were also implemented on temporally corrected sentences. Total growth in intelligibility due to these combined corrections amounts to 13%. Thus it is concluded that no dramatic gain in intelligibility may be expected if speech pathologists succeed in teaching their deaf pupils to have better control over the suprasegmental aspects of their speech.
Article
An experiment is reported in which subjects heard paragraph-length samples of time-compressed speech which were interrupted for intermediate reports either on a simple periodic basis or at points corresponding to sentence and major clause boundaries. The passages were spoken in a normal prosodic pattern, in list intonation, or were electronically processed to produce otherwise normal speech specifically deprived of pitch variation. Decrease in intelligibility scores with increasing speech rate was accompanied by a significant effect of place of interruption for report and of the prosodic pattern in which the passages were heard. Interactions among these variables were interpreted to suggest ways in which prosody ordinarily facilitates the determination of syntactic structure in connected speech.
Article
The purpose of this preliminary experiment was to evaluate the effect of a flattened fundamental frequency (F0) contour on sentence intelligibility. The perceptual dimension monotone pitch is frequently used to describe the speech of persons with dysarthria, and relatively flat F0 contours have been noted in several acoustic studies of dysarthria. To determine the independent effect of a flattened F0 contour on sentence intelligibility a resynthesis technique was used that held timing and spectral characteristics of utterances constant while allowing parametric control over successive pitch periods. Two male speakers produced low-probability utterances selected from the SPIN test, which were then resynthesized with a flattened F0 contour. Speech intelligibility was assessed using two measures: one involving word transcription and the other interval scaling. These measures were collected from 10 listeners. The results showed that both measures were significantly lower when the F0 contour was flattened, as compared with naturally varying contours. Several different explanations are proposed for this effect, which can and should be explored in greater detail using the resynthesis technique given the prominence of this characteristic in dysarthria.