ArticlePDF Available

Effects of Semantic Context and Fundamental Frequency Contours on Mandarin Speech Recognition by Second Language Learners

Authors:

Abstract and Figures

Speech recognition by second language (L2) learners in optimal and suboptimal conditions has been examined extensively with English as the target language in most previous studies. This study extended existing experimental protocols (Wang et al., 2013) to investigate Mandarin speech recognition by Japanese learners of Mandarin at two different levels (elementary vs. intermediate) of proficiency. The overall results showed that in addition to L2 proficiency, semantic context, F0 contours, and listening condition all affected the recognition performance on the Mandarin sentences. However, the effects of semantic context and F0 contours on L2 speech recognition diverged to some extent. Specifically, there was significant modulation effect of listening condition on semantic context, indicating that L2 learners made use of semantic context less efficiently in the interfering background than in quiet. In contrast, no significant modulation effect of listening condition on F0 contours was found. Furthermore, there was significant interaction between semantic context and F0 contours, indicating that semantic context becomes more important for L2 speech recognition when F0 information is degraded. None of these effects were found to be modulated by L2 proficiency. The discrepancy in the effects of semantic context and F0 contours on L2 speech recognition in the interfering background might be related to differences in processing capacities required by the two types of information in adverse listening conditions.
Content may be subject to copyright.
fpsyg-07-00908 June 11, 2016 Time: 12:9 # 1
ORIGINAL RESEARCH
published: 14 June 2016
doi: 10.3389/fpsyg.2016.00908
Edited by:
Chia-Ying Lee,
Academia Sinica, Taiwan
Reviewed by:
Lan Shuai,
Haskins Laboratories, USA
Yan Yu,
St. John’s University, USA
*Correspondence:
Hua Shu
shuhua@bnu.edu.cn;
Yang Zhang
zhanglab@umn.edu
Specialty section:
This article was submitted to
Language Sciences,
a section of the journal
Frontiers in Psychology
Received: 04 February 2016
Accepted: 01 June 2016
Published: 14 June 2016
Citation:
Zhang L, Li Y, Wu H, Li X, Shu H,
Zhang Y and Li P (2016) Effects
of Semantic Context
and Fundamental Frequency
Contours on Mandarin Speech
Recognition by Second Language
Learners. Front. Psychol. 7:908.
doi: 10.3389/fpsyg.2016.00908
Effects of Semantic Context and
Fundamental Frequency Contours on
Mandarin Speech Recognition by
Second Language Learners
Linjun Zhang1, Yu Li2, Han Wu3, Xin Li1, Hua Shu4*, Yang Zhang5*and Ping Li6
1Faculty of Linguistic Sciences and KIT-BLCU MEG Laboratory for Brain Science, Beijing Language and Culture University,
Beijing, China, 2Department of Cognitive Science and ARC Centre of Excellence in Cognition and its Disorders, Macquarie
University, Sydney, NSW, Australia, 3Department of Sociology, Tsinghua University, Beijing, China, 4National Key Laboratory
of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing,
China, 5Department of Speech-Language-Hearing Sciences and Center for Neurobehavioral Development, University of
Minnesota, Minneapolis, MN, USA, 6Department of Psychology and Center for Brain, Behavior and Cognition, Pennsylvania
State University, State College, PA, USA
Speech recognition by second language (L2) learners in optimal and suboptimal
conditions has been examined extensively with English as the target language in
most previous studies. This study extended existing experimental protocols (Wang
et al., 2013) to investigate Mandarin speech recognition by Japanese learners of
Mandarin at two different levels (elementary vs. intermediate) of proficiency. The overall
results showed that in addition to L2 proficiency, semantic context, F0 contours, and
listening condition all affected the recognition performance on the Mandarin sentences.
However, the effects of semantic context and F0 contours on L2 speech recognition
diverged to some extent. Specifically, there was significant modulation effect of listening
condition on semantic context, indicating that L2 learners made use of semantic context
less efficiently in the interfering background than in quiet. In contrast, no significant
modulation effect of listening condition on F0 contours was found. Furthermore, there
was significant interaction between semantic context and F0 contours, indicating
that semantic context becomes more important for L2 speech recognition when F0
information is degraded. None of these effects were found to be modulated by L2
proficiency. The discrepancy in the effects of semantic context and F0 contours on
L2 speech recognition in the interfering background might be related to differences
in processing capacities required by the two types of information in adverse listening
conditions.
Keywords: semantic context, fundamental frequency contours, speech recognition, second language (L2)
proficiency, interfering speech
INTRODUCTION
Speech recognition differences between native and non-native listeners have been examined
extensively in previous studies, which have consistently showed that adults learning a second
language (L2) are at a disadvantage in speech recognition as compared with native listeners,
especially in adverse listening conditions (Gat and Keith, 1978;Mayo et al., 1997;Bradlow and
Bent, 2002;Bradlow and Alexander, 2007;Golestani et al., 2009;Oliver et al., 2012). One class
Frontiers in Psychology | www.frontiersin.org 1June 2016 | Volume 7 | Article 908
fpsyg-07-00908 June 11, 2016 Time: 12:9 # 2
Zhang et al. Mandarin Speech Recognition by L2 Learners
of explanations locates the primary source of the sharp
decline in non-native listeners’ speech recognition in interfering
backgrounds at the segmental level. On this account, native
listeners have developed the ability to attend to segmental cues
that are less vulnerable to interferer-related distortions, such as
attending to formant transition cues to identify plosives when
stop-burst information is masked by interfering sounds (Parikh
and Loizou, 2005;Jiang et al., 2006). In contrast, non-native
listeners may not develop such processing flexibility and have to
attend primarily to cues that, while relatively reliable in quiet, are
largely obscured by interfering sounds (Cooke, 2006;Cutler et al.,
2008). By contrast, a different class of explanations attributes the
extra difficulty of non-native listeners under adverse listening
conditions to issues at higher processing levels. On such an
account, energetic and informational masking by interfering
sounds disrupts phonemic identification comparably for both
native and non-native listeners, but the more detrimental effects
are due to cumulative effects of interfering sounds on non-
native listeners’ access to and integration of various types of
linguistic information such as prosodic, semantic, syntactic and
pragmatic cues (Bradlow and Alexander, 2007;Golestani et al.,
2009;Calandruccio et al., 2010;Oliver et al., 2012).
In recent years, the explanations highlighting non-native
listeners’ deficiency in multiple linguistic cues have gained
support from a growing number of studies. For example, in
a direct test of the two classes of explanations discussed,
Cutler et al. (2004) used meaningless syllable-sized stimuli to
isolate segmental information from contextual information. The
results showed similar declines in phonemic identification with
decreasing signal-to-noise ratios (SNRs) for both native and
non-native listeners, indicating that interfering sounds do not
have disproportionate effects at the segmental level if lexical-
and sentence-level factors are irrelevant. These results dovetail
perfectly with the findings that non-native listeners benefited less
than native listeners from sentence-level contextual information
for word recognition in suboptimal conditions (Bradlow and
Alexander, 2007;Oliver et al., 2012).
While various types of linguistic information might contribute
to the disadvantage of non-native listeners in degraded speech
recognition, most of the previous studies focused on semantic
context with the influences of other factors, especially prosodic
cues, unexplored. As one of the most important prosodic cues,
fundamental frequency (F0) has many functions in speech. In
a tonal language like Chinese, F0 information is primarily used
to distinguish lexical meanings (Wang, 1973;Ho and Bryant,
1997). This contrasts with non-tonal languages such as English
in which variation in F0 is mainly used to mark pragmatic
meanings such as emphasis, sentence modality (declarative vs.
interrogative), and emotion (Wang, 1973;Repp and Lin, 1990;
Cutler et al., 1997). There is ample evidence that native and
non-native Chinese speakers process F0 variations differently due
to the inherent differences in linguistic/paralinguistic status of
pitch across the languages (Gandour et al., 2000;Xi et al., 2010;
Lin and Francis, 2014). Although tones have a lexical status in
Chinese, alteration of F0 patterns does not lead to unintelligibility
of words in some situations. For example, synthesized Mandarin
sentences with the original tone of each syllable substituted for a
flat tone or a tone randomly selected from the four lexical tones
could be recognized nearly perfectly by native Mandarin speakers
in quiet (Wang et al., 2013;Chen et al., 2014). However, when
presented in adverse listening conditions such as in a background
of speech-shaped noise or against a competing talker, Mandarin
speech with altered F0 patterns was substantially less intelligible
than speech with natural F0 contours (Wang et al., 2013;Chen
et al., 2014). Together with findings of studies on non-tonal
languages, these results indicate that flattened or inverted F0
contours deteriorate speech intelligibility under adverse listening
conditions regardless of whether the target language is a tonal
language or not, but the effect is more detrimental for tonal
languages (Laures and Bunton, 2003;Binns and Culling, 2007;
Miller et al., 2010;Patel et al., 2010;Wang et al., 2013). The
explanations proposed for these findings maintain that dynamic
changes in F0 help with the separation of voiced/unvoiced speech
segments and segmentation of words in continuous speech,
and direct the listener’s attention to the content words of the
utterance. Flattening or inverting F0 contours lower intelligibility
of the utterance in adverse listening conditions because such
manipulations alter the contrast between words and makes it
more difficult to parse continuous speech into meaningful units
(Laures and Bunton, 2003;Binns and Culling, 2007). This effect
is more detrimental for tonal languages than non-tonal languages
because in tonal languages original tones have to be recovered
and mapped onto the long-term phonological representations
before the lexical meaning is accessed. This process recruits
specific neural and cognitive resources in which the left planum
temporale and other brain areas participate (Xu et al., 2013).
It is well-known that prosodic features and lexical tones are
rather difficult for L2 learners to acquire in both perception and
production (Wang et al., 1999;Pennington and Ellis, 2000;Wong
and Perrachione, 2007;Kang et al., 2010;Swerts and Zerbian,
2010;Huang and Jun, 2011). Therefore, the deficiency in the use
of prosodic cues especially in combination of F0 information by
L2 learners might have detrimental effects on non-native speech
recognition, particularly under adverse listening conditions.
Because previous studies on speech recognition by non-native
listeners only used sentences with normal F0 contours, it
is impossible to determine whether and to what extent F0
information affect speech recognition by non-native listeners.
The aim of the present study was to complement the current
knowledge on speech recognition by L2 learners with respect
to several points. First, given that English had been the target
language in most previous studies, through examining Mandarin
speech recognition by L2 learners in quiet and in the presence
of an interfering single-talker speech background we aimed to
extend existing research to test the generalizability of previous
finding that non-native listeners benefit less from semantic
context under adverse listening conditions than in quiet. Second
and more importantly, we intended to explore the influence of
F0 contours and the possible interaction between F0 contours
and other factors, specifically semantic context and listening
condition on speech recognition by L2 learners, which have rarely
been examined in previous research. Finally, we were interested
in obtaining evidence on whether the influences of F0 contours,
semantic context and listening condition on speech recognition
Frontiers in Psychology | www.frontiersin.org 2June 2016 | Volume 7 | Article 908
fpsyg-07-00908 June 11, 2016 Time: 12:9 # 3
Zhang et al. Mandarin Speech Recognition by L2 Learners
jointly or interactively change over time with increasing L2
proficiency of the learners. Few previous studies have been
conducted on whether L2 proficiency modulates the interaction
effects among these factors on L2 speech recognition.
MATERIALS AND METHODS
Subjects
Sixty undergraduate participants from Beijing Language and
Culture University were recruited. The participants were all
native Japanese speakers learning Mandarin as an L2. They
were thirty freshmen with elementary proficiency in Mandarin
Chinese and thirty juniors with intermediate proficiency.
One participant was omitted from the final analysis due to
facility breakdown during data collection. The remaining 59
participants had hearing sensitivity 20 dB hearing level for
octave frequencies between 250 and 8000 Hz bilaterally. Written
informed consent was obtained from all participants. The
study was approved by the Institutional Review Board (IRB)
of the National Key Laboratory of Cognitive Neuroscience and
Learning at Beijing Normal University. A mixed-design was
adopted with F0 contours (normal vs. flat) and proficiencies
in Mandarin (elementary vs. intermediate) as between-
subject factors, while semantic context (sentence vs. word
list) and listening condition (quiet vs. SNR = +5 dB) as
within-subject factors. The participant distribution in the
between-subject conditions was as follows: elementary Mandarin
proficiency/normal F0, number of subjects (n)=15 including 11
males and four females; elementary Mandarin proficiency/flat
F0, n=14 including eight males and six females; intermediate
Mandarin proficiency/normal F0, n=15 including 10 males and
five females; intermediate Mandarin proficiency/flat F0, n=15
including nine males and six females.
Stimuli
In order to manipulate the F0 contours and semantic
context effects, four types of target sentences were created,
including normal sentences and word list sentences with
naturally intonated or unnaturally flattened contours. Similar
manipulations have been applied in our previous study (Wang
et al., 2013), but in the current study we adapted the previous
stimuli due to the L2 listeners’ limited proficiency in Mandarin.
The normal sentences were 28 declarative Chinese sentences
with each sentence comprised of 3–6 words that were familiar
to the L2 learners of both levels of proficiency. Words from
the entire pool of the normal sentences were pseudo-randomly
selected to form the word list sentences, which were syntactically
anomalous and semantically meaningless at the whole sentence
level. They were matched in length (number of syllables) with the
normal sentences. The normal sentences and word list sentences
were read by a male native speaker of Chinese (pitch range:
90–236 Hz). Manipulation of F0 was done using Praat (Institute
of Phonetic Sciences, University of Amsterdam; downloadable at
www.praat.org). A flat F0 contour was created for each sentence
at the sentence’s mean F0 and the resulting monotonous sentence
was resynthesized using the PSOLA method (Figure 1).
As in our previous study of the Mandarin speech intelligibility
test on adult native speakers of Mandarin Chinese (Wang et al.,
2013), consonant-misplaced sentences were used as masker
stimuli in order to minimize the effects of informational masking
because consonant-misplaced sentences were syntactically
FIGURE 1 | Acoustic features of sample speech stimuli. Broadband spectrograms (SPG: 0–5 kHz), intensity envelopes (INT: 50–100 dB), and fundamental
frequency contours (F0: 0–300 Hz) are displayed for (A) normal sentence and its pitch-flattened counterpart; (B) word list sentence and its pitch-flattened
counterpart.
Frontiers in Psychology | www.frontiersin.org 3June 2016 | Volume 7 | Article 908
fpsyg-07-00908 June 11, 2016 Time: 12:9 # 4
Zhang et al. Mandarin Speech Recognition by L2 Learners
anomalous and unintelligible at both lexical and sentential
levels (Xu et al., 2013). These sentences were constructed
by replacing the onset consonant of each syllable in the
normal sentences with another consonant, provided that
the replacement did not violate the phonotactic rules of
Mandarin. The masker sentences were read by a female native
Mandarin speaker (pitch range: 159–351 Hz). The choice
of a male target speaker and a female masking speaker was
to enable the clear instruction “listen to the male speaker”
to be used throughout. Without this indexical information,
participants would have to be trained on the identity of the target
speaker.
Each target sentence was combined with masker noise at the
SNR level of +5 dB with the target and masker sentences fixed at
75 and 70 dB sound pressure level (SPL) respectively. The masker
speech was edited to be, on average, 500 ms longer than the target
speech in order to ensure that no part of the speech target was
unmasked.
Procedures
We followed the same experimental procedures in our previous
study (Wang et al., 2013). Specifically, listeners were tested
individually in a sound-attenuated booth with ambient noise level
below 15 dB(A). The stimuli were presented via loudspeakers
(Edifier R18, Edifier Technology, Co. Ltd., Beijing, China). The
sound level of the stimuli was calibrated to 65 dB SPL at the
subject’s head. Because semantic context and listening condition
were within-subject factors, each listener was presented with
a total of 56 trials—14 normal sentences and 14 word list
sentences in two listening conditions. Listeners were instructed
that they would be listening to sentences in a quiet or interfering
background, and were asked to write down the words read by the
male speaker in Pinyin (a system for transcribing Mandarin with
the Latin alphabet). The task was self-paced; listeners pressed
a key to advance from one trial to the next. Each sentence
could be heard only once. The participants were allowed to
repeat the stimuli once and were instructed to guess which
words they heard. Their written responses were recorded by one
author of the current study (LZ). Incorrect or omitted words
were annotated, and the scoring results were checked by an
independent researcher blind to the experiment (who is also
knowledgeable about phonology/linguistics). Practice sentences
(not used in the real experiment) were provided before the
experiment, which represented samples of all conditions. After
the practice block, the experimenter checked the readability of
the participant’s handwriting.
Only when all the phonological features, i.e., consonant, vowel
and tone were correctly identified, the answer was considered
correct. Speech recognition accuracy was determined by a
keyword-correct count (Scott et al., 2004;Wang et al., 2013;
Zhang et al., 2014). The number of keywords (content words,
varied across sentences from 3 to 5) identified correctly by
each listener was counted and then converted to the percentage
of the total number of words and averaged across listeners.
A 2 ×2×2×2 repeated measures analysis of variance (ANOVA)
was carried out, with sentence context and listening condition as
the within-subject factors and F0 contours and L2 proficiency as
the between-subject factors.
Short-term memory was also measured by using the Chinese
version of forward digit span from the WAIS III (Wechsler,
1997). In this task sets of 3–12 digits were presented to
participants at the rate of 1 digit per second, and participants
were asked to recall in the same order in which they heard the
digits. The score was represented by the longest string that the
participant correctly reported.
RESULTS
Scores of the Chinese version of forward digit span for the
four between-subject groups are as follows: 6.8 ±2.0 and
7.5 ±1.5 for the two groups with elementary Mandarin
proficiency; 7.2 ±2.1 and 7.5 ±1.9 for the two groups with
intermediate Mandarin proficiency. There were no significant
main effects or interaction effects across the groups [main effect
of L2 proficiency: F(1,55) =0.236, p>0.1, η2=0.004; main
effect of F0 contours: F(1,55) =2.791, p=0.1, η2=0.048;
interaction effect between L2 proficiency and F0 contours :
F(1,55) =0.993, p>0.1, η2=0.018]. These results showed that
participants assigned to the four between-subject groups were
equally matched in Mandarin short-term memory, indicating
that difference in speech recognition accuracy across groups were
not due to discrepancy in short-term memory.
Speech recognition accuracy results showed that the four main
effects were all significant [semantic context: F(1,55) =86.747,
p<0.001, η2=0.612; listening condition: F(1,55) =60.181,
p<0.001, η2=0.522; F0 contours: F(1,55) =6.512, p=0.014,
η2=0.106; L2 proficiency, F(1,58) =41.707, p<0.001,
η2=0.431], revealing that all the factors influenced Mandarin
speech recognition when L2 listeners are engaged in the task.
Specifically, Mandarin speech recognition accuracy improved
with increasing L2 proficiency, which was compromised by the
lack of sentence context and natural F0 contours and by the
presence of interfering speech. Further analyses comparing all
possible two- and three-way and the four-way ANOVAs revealed
no significant interactions other than the following two-way
interactions: the interaction effect between semantic context and
F0 contours [F(1,55) =4.813, p=0.032, η2=0.08] revealed
that recognition of normal sentences degraded to a less extent
than word list sentences when the F0 patterns of the two types
of sentences changed from natural contours to flat contours;
the interaction effect between semantic context and listening
condition [F(1,55) =17.982, p<0.001, η2=0.246] revealed
that recognition of normal sentences degraded to a greater extent
than word list sentences when listening condition changed from
quiet to the interfering background (Figure 2). Four follow-up
simple effects tests on each of the significant interactions revealed
that recognition accuracy was significantly different for most
contrasting pairs [p-values <0.0125, Bonferroni corrected with
significance threshold set at α=0.05]. The exceptional pair was
normal sentence with natural contours vs. normal sentence with
flat contours [F(1,55) =3.054, p=0.086; see Appendix Tables A1
and A2 for the results of the simple effects tests].
Frontiers in Psychology | www.frontiersin.org 4June 2016 | Volume 7 | Article 908
fpsyg-07-00908 June 11, 2016 Time: 12:9 # 5
Zhang et al. Mandarin Speech Recognition by L2 Learners
FIGURE 2 | Word-report scores sorted by the main effects and interaction effects. (A) Main effects. (B) Significant two-way interaction effects. Error bars
represent one standard error. p<0.05, ∗∗∗ p<0.001.
DISCUSSION
The present study explored Mandarin speech recognition by
Japanese learners of Mandarin at elementary and intermediate
levels of proficiency. The overall results showed that all the
factors examined in the study, i.e., semantic context, F0
contours, listening condition and L2 proficiency, affected L2
speech recognition. As the primary goal of this study was to
assess the influences of semantic context and F0 contours on
Mandarin speech recognition by non-native listeners in adverse
listening conditions, our results revealed divergent patterns for
the two factors. Specifically, there was significant modulation
effect of listening condition on semantic context, indicating
that L2 listeners benefited less from semantic context under
adverse listening conditions than in quiet. In contrast, there was
no significant modulation effect of listening condition on F0
contours. Furthermore, none of these effects were found to be
modulated by L2 proficiency.
Previous work has shown that L2 learners are less accurate
than native listeners in speech recognition under adverse
listening conditions and this disadvantage is in large part ascribed
to the deficient use of semantic context by non-native listeners
(Mayo et al., 1997;Golestani et al., 2009;Oliver et al., 2012).
The current results extend these findings by demonstrating that
recognition of normal sentences degraded to a greater extent
than word list sentences when listening condition changed from
quiet to an interfering background for both groups of non-
native listeners. These results are in stark contrast to those of
our previous study with native Mandarin speakers (Wang et al.,
2013), in which recognition of word list sentences degraded to
a larger extent than normal sentences when listening condition
changed from quiet to interfering backgrounds, highlighting the
Frontiers in Psychology | www.frontiersin.org 5June 2016 | Volume 7 | Article 908
fpsyg-07-00908 June 11, 2016 Time: 12:9 # 6
Zhang et al. Mandarin Speech Recognition by L2 Learners
contribution of sentential-semantic context to speech recognition
in competing speech by native listeners. The inefficient use
of sentential-semantic context by non-native listeners against
interfering speech might be related with their insufficient
knowledge of and experience with L2, which results in a
bottleneck of processing resources when listening to speech in
adverse listening conditions. Furthermore, it is widely agreed
that L1 processing is automatic involving a neutrally committed
system to the sound patterns of the native language (Zhang
et al., 2005;Zhang and Wang, 2007), but L2 processing is
highly controlled and effortful (Dekeyser, 2001;Segalowitz and
Hulstijn, 2005;Oliver et al., 2012), especially at elementary and
intermediate levels. Thus, it is possible that during non-native
speech recognition under adverse listening conditions, linguistic
and cognitive processing resources are lacking for an automatic
and efficient semantic processing. That is, the decrease in benefits
of semantic context during L2 speech recognition under adverse
listening conditions is due to capacity limitations of the non-
native listeners. The result that effect of semantic context was
not modulated by L2 proficiency is also consistent with the
results by Mayo et al. (1997) and Shi (2010), which showed that
even L2 listeners who have spoken the L2 since infancy cannot
reach monolingual speakers’ level of performance on auditory
sentence comprehension, especially under suboptimal listening
conditions.
Although, F0 plays a crucial role in speech recognition by
native listeners under adverse listening conditions (Laures and
Bunton, 2003;Binns and Culling, 2007;Miller et al., 2010;
Patel et al., 2010;Wang et al., 2013), how and whether it
affects L2 speech recognition, especially for a tonal language,
have rarely been explored. The significant main effect of F0
contours and absence of interaction effect between F0 contours
and listening condition jointly revealed that natural F0 contours
contributed to speech recognition by L2 listeners in both
the quiet and interfering backgrounds. Furthermore, simple
effect tests on the interaction between semantic context and
F0 contours showed that recognition of word list sentences
degraded significantly when the F0 patterns changed from
natural contours to flat contours. The key difference between
the two types of word list sentences is whether there are
natural F0 patterns of the original tones. Taken together,
these results indicate that both sentence-level (prosody) and
word-level (lexical tone) F0 information could be used by L2
listeners during Mandarin speech recognition. The contribution
of F0 contours to Mandarin speech recognition by L2 learners
might be attributed to the development of sensitivity to
Mandarin-specific F0 patterns and reflect, at least in part, the
acquisition of lexical tones by non-native listeners at both
elementary and intermediate levels of Mandarin proficiency.
Neither the main effect of F0 contours nor the interaction
effect between F0 contours and semantic context was found
to be modulated by L2 proficiency. In this study, however,
even the L2 listeners with elementary Mandarin proficiency
had studied Mandarin for about 8 months. Whether and
how sentence-level (prosody) and word-level (lexical tone) F0
information can be used by L2 listeners with lower levels of
Mandarin proficiency needs to be further investigated. One
interesting issue that also needs further investigation is whether
native language of the participants (Japanese, a pitch-accent
language which also uses variations in pitch to differentiate
words) would have affected our findings, especially the use of
F0 information during speech recognition in the interfering
background. Future studies can be designed to compare
Mandarin learners with different types of native languages,
e.g., Thai (tonal language) and English (non-tonal language) to
explore the generalizability of current results to Mandarin L2
speech recognition.
Interestingly, there was significant interaction between
semantic context and F0 contours, i.e., recognition of normal
sentences degraded to a less extent than word list sentences
when F0 patterns of the two types of sentences changed
from natural contours to flat contours. This result indicates
that L2 listeners benefited more from semantic context when
F0 information is degraded, which is in sharp contrast to
the finding that they benefited less from semantic context
in the interfering background than in quiet. The divergent
modulation effects of listening condition and F0 contours
on semantic context might be related to the differences
in the distortions caused by interfering speech and flat F0
contours to the target speech signals. Specifically, interfering
speech adds masking to the recognition task (Calandruccio
et al., 2010) while flattening F0 contours creates distortion
to the target speech itself (Miller et al., 2010;Wang et al.,
2013). A recent functional magnetic resonance imaging study
revealed that different neural and cognitive resources are
recruited during the processing of external (e.g., background
noise) and internal (e.g., accent) distortions during auditory
sentence comprehension with the bilateral inferior frontal areas
responsible for processing external distortions and the left
temporal areas responsible for processing internal distortions
(Adank et al., 2012). Future studies are needed to examine
whether distinct neural substrates are involved in L2 speech
recognition in the presence of interfering backgrounds and
absence of natural F0 information.
In this study, single-talker babbles, specifically, consonant-
misplaced sentences read by a female speaker were used as masker
stimuli. One concern might arise with the potential effects of the
specific masks on the current findings. Different types of masking
stimuli such as single- and multi-talker babbles, white noise and
speech-shaped noise have been used in previous studies on speech
recognition in adverse listening conditions (Scott et al., 2004;
Binns and Culling, 2007;Bradlow and Alexander, 2007;Golestani
et al., 2009;Calandruccio et al., 2010;Patel et al., 2010;Oliver
et al., 2012). Various interfering sounds tend to produce different
masking effects, e.g., single-talker interferers usually produce
less masking than multiple-talker babbles. Furthermore, in the
current study, male voice was always the target and female voice
was always the masker. This gender difference in the material
could potentially affect our results given that male voice has
lower F0 than female voice in general. More studies adopting a
counterbalance design with either female or male voice served
as target and different types of interfering sounds are needed
in order to better understand Mandarin L2 speech recognition
under suboptimal conditions.
Frontiers in Psychology | www.frontiersin.org 6June 2016 | Volume 7 | Article 908
fpsyg-07-00908 June 11, 2016 Time: 12:9 # 7
Zhang et al. Mandarin Speech Recognition by L2 Learners
CONCLUSION
Our results support the general finding in the literature that
non-native listeners benefit less from semantic context under
adverse listening conditions than in quiet and, at the same time,
provide initial evidence for the contribution of F0 contours
and modulation effect of semantic context on F0 contours
during L2 speech recognition for a tonal language irrespective
of listening backgrounds. The discrepancy in the influences of
semantic context and F0 contours on L2 speech recognition in
the interfering background might be related to differences in
processing capacities required by the two types of information in
adverse listening conditions.
AUTHOR CONTRIBUTIONS
LZ, YL, HS, YZ, and PL designed research; LZ, YL, HW, and XL
performed research; LZ, YL, and YZ analyzed data; LZ, HS, YZ,
and PL wrote the paper.
ACKNOWLEDGMENTS
We would like to thank Xianjun Tan for her assistance in data
collection. This research was supported by grants from Faculty
of Linguistic, the Science Foundation of Beijing Language and
Culture University (Fundamental Research Funds for the Central
Universities; 14YJ150003, 16WT02), the Program for New
Century Excellent Talents in University (NCET–13–0691) to LZ,
from the Natural Science Foundation of China (81461130018)
to HS, and in part from the US National Science Foundation
(BCS-1349110) to PL. YZ was additionally supported by a
Brain Imaging Research Project award from the University of
Minnesota.
SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found
online at: http://journal.frontiersin.org/article/10.3389/fpsyg.
2016.00908
REFERENCES
Adank, P., Davis, M. H., and Hagoort, P. (2012). Neural dissociation in processing
noise and accent in spoken language comprehension. Neuropsychologia 50,
77–84. doi: 10.1016/j.neuropsychologia.2011.10.024
Binns, C., and Culling, J. F. (2007). The role of fundamental frequency contours
in the perception of speech against interfering speech. J. Acoust. Soc. Am. 122,
1765–1776. doi: 10.1121/1.2751394
Bradlow, A. R., and Alexander, J. A. (2007). Semantic and phonetic enhancements
for speech-in-noise recognition by native and non-native listeners. J. Acoust.
Soc. Am. 121, 2339–2349. doi: 10.1121/1.2642103
Bradlow, A. R., and Bent, T. (2002). The clear speech effect for non-native listeners.
J. Acoust. Soc. Am. 112, 272–284. doi: 10.1121/1.1487837
Calandruccio, L., Dhar, S., and Bradlow, A. R. (2010). Speech-on-speech masking
with variable access to the linguistic content of the masker speech. J. Acoust. Soc.
Am. 128, 860–869. doi: 10.1121/1.3458857
Chen, F., Wong, L. L., and Hu, Y. (2014). Effects of lexical tone contour on
Mandarin sentence intelligibility. J. Speech Lang. Hear. Res. 57, 338–345. doi:
10.1044/1092-4388
Cooke, M. (2006). A glimpsing model of speech perception in noise. J. Acoust. Soc.
Am. 119, 1562–1573. doi: 10.1121/1.2166600
Cutler, A., Dahan, D., and van Donselaar, W. (1997). Prosody in the
comprehension of spoken language: a literature review. Lang. Speech 40, 141–
201. doi: 10.1177/002383099704000203
Cutler, A., Garcia Lecumberri, M. L., and Cooke, M. (2008). Consonant
identification in noise by native and nonnative listeners: effects of local context.
J. Acoust. Soc. Am. 124, 1264–1268. doi: 10.1121/1.2946707
Cutler, A., Webber, A., Smits, R., and Cooper, N. (2004). Patterns of english
phoneme confusions by native and non-native listeners. J. Acoust. Soc. Am. 116,
3668–3678. doi: 10.1121/1.1810292
Dekeyser, R. (2001). “Automaticity and automatization,” in Cognition and Second
Language Instruction, ed. P. Robinson (New York, NY: Cambridge University
Press), 125–151.
Gandour, J., Wong, D., Hsieh, L., Weinzapfel, B., Van Lancker, D., and Hutchins,
G. D. (2000). A crosslinguistic PET study of tone perception. J. Cogn. Neurosci.
12, 207–222. doi: 10.1162/089892900561841
Gat, I. N., and Keith, R. W. (1978). An effect of linguistic experience. Auditory
word discrimination by native and non-native speakers of english. Audiology
17, 339–345. doi: 10.3109/00206097809101303
Golestani, N., Rosen, S., and Scott, S. K. (2009). Native-language benefit for
understanding speech-in-noise: the contribution of semantics. Biling. (Camb
Engl). 12, 385–392. doi: 10.1017/S1366728909990150
Ho, C. S., and Bryant, P. (1997). Development of phonological awareness of
Chinese children in Hong Kong. J. Psycholinguist. Res. 26, 109–126. doi:
10.1023/A:1025016322316
Huang, B. H., and Jun, S. A. (2011). The effect of age on the acquisition of
second language prosody. Lang. Speech 54, 387–414. doi: 10.1177/00238309114
02599
Jiang, J., Chen, M., and Alwan, A. (2006). On the perception of voicing in
syllable-initial plosives in noise. J. Acoust. Soc. Am. 119, 1092–1105. doi:
10.1121/1.2149841
Kang, O., Rubin, D., and Pickering, L. (2010). Suprasegmental measures
of accentedness and judgments of language learner proficiency in
oral english. Mod. Lang. J. 94, 554–566. doi: 10.1111/j.1540-4781.2010.
01091.x
Laures, J. S., and Bunton, K. (2003). Perceptual effects of a flattened fundamental
frequency at the sentence level under different listening conditions. J. Commun.
Disord. 36, 449–464. doi: 10.1016/S0021-9924(03)00032-7
Lin, M., and Francis, A. L. (2014). Effects of language experience and expectations
on attention to consonants and tones in english and Mandarin Chinese.
J. Acoust. Soc. Am. 136, 2827–2838. doi: 10.1121/1.4898047
Mayo, L. H., Florentine, M., and Buus, S. (1997). Age of second-language
acquisition and perception of speech in noise. J. Speech Lang. Hear. Res. 40,
686–693. doi: 10.1044/jslhr.4003.686
Miller, S. E., Schlauch, R. S., and Watson, P. J. (2010). The effects of fundamental
frequency contour manipulations on speech intelligibility in background noise.
J. Acoust. Soc. Am. 128, 435–443. doi: 10.1121/1.3397384
Oliver, G., Gullberg, M., Hellwig, F., Mitterer, H., and Indefrey, P. (2012). Acquiring
L2 sentence comprehension: a longitudinal study of word monitoring
in noise. Biling. Lang. Cogn. 15, 841–857. doi: 10.1017/S13667289120
00089
Parikh, G., and Loizou, P. C. (2005). The influence of noise on vowel and consonant
cues. J. Acoust. Soc. Am. 118, 3874–3888. doi: 10.1121/1.2118407
Patel, A. D., Xu, Y., and Wang, B. (2010). “The role of F0 variation in the
intelligibility of Mandarin sentences,” in Proceedings of Speech Prosody 2010,
Chicago, IL.
Pennington, M. C., and Ellis, N. C. (2000). Cantonese speakers’ memory for english
sentences with prosodic cues. Mod. Lang. J. 84, 372–389. doi: 10.1111/0026-
7902.00075
Repp, B. H., and Lin, H. B. (1990). Integration of segmental and tonal information
in speech perception: across-linguistic study. J. Phon. 18, 481–495. doi:
10.1121/1.2028239
Scott, S. K., Rosen, S., Wickham, L., and Wise, R. J. (2004). A positron
emission tomography study of the neural basis of informational and energetic
Frontiers in Psychology | www.frontiersin.org 7June 2016 | Volume 7 | Article 908
fpsyg-07-00908 June 11, 2016 Time: 12:9 # 8
Zhang et al. Mandarin Speech Recognition by L2 Learners
masking effects in speech perception. J. Acoust. Soc. Am. 115, 813–821. doi:
10.1121/1.1639336
Segalowitz, N., and Hulstijn, J. (2005). “Automaticity in bilingualism and second
language learning,” in Handbook of Bilingualism: Psycholinguistic Approaches,
eds J. F. Kroll and A. M. B. De Groot (Oxford: Oxford University Press),
371–388.
Shi, L. F. (2010). Perception of acoustically degraded sentences in bilingual listeners
who differ in age of english acquisition. J. Speech. Lang. Hear. Res. 53, 821–835.
doi: 10.1044/1092-4388(2010/09-0081)
Swerts, M., and Zerbian, S. (2010). Intonational differences between L1 and
L2 english in South Africa. Phonetica 67, 127–146. doi: 10.1159/0003
21052
Wang, J., Shu, H., Zhang, L., Liu, Z., and Zhang, Y. (2013). The roles of fundamental
frequency contours and sentence context in Mandarin Chinese speech
intelligibility. J. Acoust. Soc. Am. 134, EL91–EL97. doi: 10.1121/1.4811159
Wang, W. S. Y. (1973). The Chinese language. Sci. Am. 228, 50–60. doi:
10.1038/scientificamerican0273-50
Wang, Y., Spence, M. M., Jongman, A., and Sereno, J. A. (1999). Training American
listeners to perceive Mandarin tones. J. Acoust. Soc. Am. 106, 3649–3658. doi:
10.1121/1.428217
Wechsler, D. (1997). Wechsler Adult Intelligence Scale, Vol. 3. New York, NY:
Psychological Corporation.
Wong, P. C. M., and Perrachione, T. K. (2007). Learning pitch patterns in lexical
identification by native english-speaking adults. Appl. Psycholinguist. 28, 565–
585. doi: 10.1017/S0142716407070312
Xi, J., Zhang, L., Shu, H., Zhang, Y., and Li, P. (2010). Categorical perception of
lexical tones in Chinese revealed by mismatch negativity. Neuroscience 170,
223–231. doi: 10.1016/j.neuroscience.2010.06.077
Xu, G., Zhang, L., Shu, H., Wang, X., and Li, P. (2013). Access to lexical meaning
in pitch-flattened Chinese sentences: an fMRI study. Neuropsychologia 51,
550–556. doi: 10.1016/j.neuropsychologia.2012.12.006
Zhang, J., Xie, L., Li, Y., Chatterjee, M., and Ding, N. (2014). How noise and
language proficiency influence speech recognition by individual non-native
listeners. PLoS ONE 9:e113386. doi: 10.1371/journal.pone.0113386
Zhang, Y., Kuhl, P. K., Imada, T., Kotani, M., and Tohkura, Y. (2005). Effects
of language experience: neural commitment to language-specific auditory
patterns. Neuroimage 26, 703–720. doi: 10.1016/j.neuroimage.2005.02.040
Zhang, Y., and Wang, Y. (2007). Neural plasticity in speech learning and
acquisition. Biling. Lang. Cogn. 10, 147–160. doi: 10.1017/S1366728907002908
Conflict of Interest Statement: The authors declare that the research was
conducted in the absence of any commercial or financial relationships that could
be construed as a potential conflict of interest.
Copyright © 2016 Zhang, Li, Wu, Li, Shu, Zhang and Li.This is an open-access article
distributed under the terms of the Creative Commons Attribution License (CC BY).
The use, distribution or reproduction in other forums is permitted, provided the
original author(s) or licensor are credited and that the original publication in this
journal is cited, in accordance with accepted academic practice. No use, distribution
or reproduction is permitted which does not comply with these terms.
Frontiers in Psychology | www.frontiersin.org 8June 2016 | Volume 7 | Article 908
... Previous studies have reported that non-native listeners can make use of auditory semantic-contextual cues (e.g., a previous sentence context) in adverse listening conditions to aid comprehension, but only when the auditory signal is of sufficient quality to facilitate access to semantic information (Bradlow & Alexander, 2007;Mayo, Florentine, & Buus, 1997;Zhang et al., 2016). Non-native listeners can also benefit from visual semantic cues from gestures (Dahl & Ludvigsen, 2014;Sueyoshi & Hardison, 2005), but this has been only studied behaviorally in clear speech and with low-proficient non-native listeners. ...
... Previous behavioral research on non-native degraded speech comprehension has been mostly tested in an auditory context, using only auditory semantic information in a verbal context as a modulating factor. These studies reported differences between native and highly proficient non-native listeners in terms of how previous auditory semantic context is taken into account during adverse listening conditions (Bradlow & Alexander, 2007;Bradlow & Bent, 2002;Gat & Keith, 1978;Golestani, Rosen, & Scott, 2009;Mayo et al., 1997;Oliver, Gullberg, Hellwig, Mitterer, & Indefrey, 2012;Shimizu, Makishima, Yoshida, & Yamagishi, 2002;Wijngaarden et al., 2002;Zhang et al., 2016). However, how these differences are reflected in neural activity remains unknown. ...
... Conversely, native listeners could benefit from acoustic and semantic information both in combination and separately. One of the explanations for this difference between native and non-native listeners is that non-native listeners might not be able to use semantic contextual information to resolve the information loss at the phoneme level when the signal clarity was insufficient (e.g., Bradlow & Alexander, 2007;Golestani et al., 2009;Oliver et al., 2012;Zhang et al., 2016). In line with this, another audiovisual behavioral study by Hazan et al. (2006) demonstrated that non-native listeners effectively incorporate and use visual cues from visible speech that are related to phonological features in the auditory signal to enhance speech comprehension in noise, and that increasing auditory proficiency is linked to an increased use of visual cues by nonnative listeners. ...
Article
Full-text available
Native listeners neurally integrate iconic gestures with speech, which can enhance degraded speech comprehension. However, it is unknown how non-native listeners neurally integrate speech and gestures, as they might process visual semantic context differently than natives. We recorded EEG while native and highly-proficient non-native listeners watched videos of an actress uttering an action verb in clear or degraded speech, accompanied by a matching ('to drive'+driving gesture) or mismatching gesture ('to drink'+mixing gesture). Degraded speech elicited an enhanced N400 amplitude compared to clear speech in both groups, revealing an increase in neural resources needed to resolve the spoken input. A larger N400 effect was found in clear speech for non-natives compared to natives, but in degraded speech only for natives. Non-native listeners might thus process gesture more strongly than natives when speech is clear, but need more auditory cues to facilitate access to gestural semantic information when speech is degraded.
... A potential explanation for these findings is that non-native listeners need more phonological cues to benefit from the semantic information that is conveyed by the gesture (Drijvers and € Ozyürek, 2019). This is in line with previous behavioral work that demonstrated that non-native listeners can only utilize auditory semantic-contextual cues for comprehension when the auditory signal is of sufficient quality to allow access to semantic cues (Bradlow and Alexander, 2007;Golestani et al., 2009;Hazan et al., 2006;Mayo et al., 1997;Oliver et al., 2012;Zhang et al., 2016). However, it is unknown which brain areas engage in this process over time, and how this differs from native listeners, who are not challenged by internally induced adverse listening conditions when understanding language. ...
... However, previous work suggested that non-native listeners might only be able to utilize semantic cues when the auditory signal is of sufficient quality to allow access to these semantic cues (Bradlow and Alexander, 2007;Golestani et al., 2009;Hazan et al., 2006;Mayo et al., 1997;Oliver et al., 2012;Zhang et al., 2016). Therefore, we conducted some exploratory analyses to investigate possible differences between the two groups. ...
... They can therefore already optimize their processing strategy in an early time window, whereas non-native listeners are not able to do this as it is more difficult for them to access the degraded input to map the semantic information from the gesture to. This is in line with unimodal, behavioral studies that investigated the effects of auditory semantic context on non-native degraded speech comprehension (Bradlow and Alexander, 2007;Golestani et al., 2009;Hazan et al., 2006;Mayo et al., 1997;Oliver et al., 2012;Zhang et al., 2016), and fits with our previous behavioral study (Drijvers and € Ozyürek, 2019) and EEG results (Drijvers and € Ozyürek, 2018). ...
Article
Full-text available
Listeners are often challenged by adverse listening conditions during language comprehension induced by external factors, such as noise, but also internal factors, such as being a non-native listener. Visible cues, such as semantic information conveyed by iconic gestures, can enhance language comprehension in such situations. Using magnetoencephalography (MEG) we investigated whether spatiotemporal oscillatory dynamics can predict a listener's benefit of iconic gestures during language comprehension in both internally (non-native versus native listeners) and externally (clear/degraded speech) induced adverse listening conditions. Proficient non-native speakers of Dutch were presented with videos in which an actress uttered a degraded or clear verb, accompanied by a gesture or not, and completed a cued-recall task after every video. The behavioral and oscillatory results obtained from non-native listeners were compared to an MEG study where we presented the same stimuli to native listeners (Drijvers et al., 2018a). Non-native listeners demonstrated a similar gestural enhancement effect as native listeners, but overall scored significantly slower on the cued-recall task. In both native and non-native listeners, an alpha/beta power suppression revealed engagement of the extended language network, motor and visual regions during gestural enhancement of degraded speech comprehension, suggesting similar core processes that support unification and lexical access processes. An individual's alpha/beta power modulation predicted the gestural benefit a listener experienced during degraded speech comprehension. Importantly, however, non-native listeners showed less engagement of the mouth area of the primary somatosensory cortex, left insula (beta), LIFG and ATL (alpha) than native listeners, which suggests that non-native listeners might be hindered in processing the degraded phonological cues and coupling them to the semantic information conveyed by the gesture. Native and non-native listeners thus demonstrated similar yet distinct spatiotemporal oscillatory dynamics when recruiting visual cues to disambiguate degraded speech.
... Our previous research, employing Mandarin Chinese speakers as participants, showed that there was interaction between sentence context and F0 contours during speech comprehension in quiet condition: normal sentences with flattened F0 contours were just as comprehended as normal sentences with natural F0 contours, syntactically/ semantically anomalous sentences with flattened F0 contours can not be as comprehended as syntactically/semantically anomalous sentences with natural F0 contours (Jiang et al., 2017;Wang et al., 2013;Xu et al., 2013;Zhang et al., 2016). However, the interaction was not significant in noise condition, for context can't plays work on top-down information to guide speech comprehension with noisy background . ...
... However, in syntactically/semantically anomalous sentences, flattening the F0 contours dramatically reduced the comprehension compared with the mild decrease in comprehension for normal sentences with natural F0 contours. In other words, people can use the context as a cue to comprehend the meaning of sentences, especially in the sentences with flattened F0 contours, this is consistent with our previous work (Jiang et al., 2017;Wang et al., 2013;Xu et al., 2013;Zhang et al., 2016). ...
Article
Sentence context and fundamental frequency (F0) contours are important factors to speech perception and comprehension. In Chinese-Mandarin, lexical tones can be distinguished by the F0 contours. Previous studies found healthy people could use the cue of context to recover the phonological representations of lexical tones from the altered tonal patterns to comprehend the sentences in quiet condition, but can not in noise environment. Lots of research showed that patients with schizophrenia have deficits of speech perception and comprehension. However, it is unclear how context and F0 contours influence speech perception and comprehension in patients with schizophrenia. This study detected the contribution of context and lexical tone to sentence comprehension in four types of sentences by manipulating the context and F0 contours in 32 patients with schizophrenia and 33 healthy controls. The results showed that (1) in patients with schizophrenia, the interaction between context and F0 contour was not significant, which was significant in healthy controls; (2) the scores of sentences with two types of sentences with flattened F0 contours were negatively correlated with hallucination trait scores; (3) the patients with schizophrenia showed significantly lower scores on the intelligibility of sentences in all conditions, which were negatively correlated with PANSS-P. The patients with schizophrenia couldn't use the cue of context to recover the phonological representations of lexical tones from the altered tonal patterns when they comprehend the sentences, inner noise may be the underlying mechanism for the deficits of speech perception and comprehension.
... Since most normal communication takes places with some background noise, the intelligibility and comprehensibility of the flat-tone sentences is likely to pose difficulty for listeners. For L2 listeners, they may not be familiar with the topics involved and, more importantly, the co-articulations of tones, and their listening ability is adversely affected by the environment (Bradlow and Bent 2002;Cooke et al. 2008;Cutler et al. 2008;Zhou et al. 2017;Zhang et al. 2016). Therefore, it is expected that the flat-intonation sentence will very likely pose even greater challenge for L2 listeners. ...
Chapter
This study examines the effects of segments, intonation and rhythm on the perception of second language (L2) accentedness and comprehensibility by focusing on a tone language, Mandarin Chinese. Fifteen Chinese sentences were manipulated by transferring the segments, intonation and rhythm between native and L2 speakers. 64 Chinese judges listened to the original and the manipulated sentences and were asked to rate the accentedness and comprehensibility of these sentences. Results of the Chinese native judges’ ratings showed that segments contribute more to the perception of L2 accentedness and comprehensibility than intonation and rhythm, and that intonation contributed more to L2 perception than rhythm. It was also found that accentedness ratings highly correlated with comprehensibility judgment. The findings of this study confirm what some recent studies have found regarding the contribution of segments and prosody to L2 perception, but differ from some previous studies in regards to the relationship between L2 accentedness and comprehensibility. This study has both theoretical and pedagogical implications.
... The speech materials were used in our previous studies with native Japanese speakers learning Chinese as a foreign language (Zhang et al., 2016) and Chinese school-age children (Zhou et al., 2017). Cronbach's alpha values ranged from 0.81 to 0.89 for different types of stimuli in all the studies including the present one, reflecting high internal consistency of each type of stimuli. ...
Article
Full-text available
Previous work has shown that children with dyslexia are impaired in speech recognition in adverse listening conditions. Our study further examined how semantic context and fundamental frequency (F0) contours contribute to word recognition against interfering speech in dyslexic and non-dyslexic children. Thirty-two children with dyslexia and 35 chronological-age-matched control children were tested on the recognition of words in normal sentences versus wordlist sentences with natural versus flat F0 contours against single-talker interference. The dyslexic children had overall poorer recognition performance than non-dyslexic children. Furthermore, semantic context differentially modulated the effect of F0 contours on the recognition performances of the two groups. Specifically, compared with flat F0 contours, natural F0 contours increased the recognition accuracy of dyslexic children less than non-dyslexic children in the wordlist condition. By contrast, natural F0 contours increased the recognition accuracy of both groups to a similar extent in the sentence condition. These results indicate that access to semantic context improves the effect of natural F0 contours on word recognition in adverse listening conditions by dyslexic children who are more impaired in the use of natural F0 contours during isolated and unrelated word recognition. Our findings have practical implications for communication with dyslexic children when listening conditions are unfavorable.
... This finding attributes to the general conclusion that F0 serves as a major cue in understanding speech in adverse listening or in a multiple speaker conditions which help us to identify the speaker and thereby helping to separate the target vs. the masker. This is in consensus with the previous literature (Song et al. 2011;Zhang et al. 2016). Even though older adults benefit less from the pitch/F0 cues compared to younger adults which is reflected even in the brainstem encoding (Helfer and Freyman 2008), it is indicated that F0 remains robust when compared to other parameters and correlates well with speech perception in noise scores (Anderson et al. 2011). ...
Article
Objective: The study aimed to investigate the influence of subcortical auditory processing and cognitive measures on cocktail party listening in younger and older adults with normal hearing sensitivity. Design: Tests administered included quick speech perception in noise test to assess cocktail party listening, speech auditory brainstem response to assess subcortical auditory processing and digit span, digit sequencing and spatial selective attention test to assess cognitive processing. Study sample: A total of 92 participants with normal hearing sensitivity participated in the study. They were divided into two groups: 52 young adults (20–40 years) and 40 older adults (60–80 years). Results: The older adults performed significantly poorer than, the younger adults on the quick speech perception in noise test and various cognitive measures. Further, cognitive measures correlated with speech perception in noise in younger and older adults. The results of this study also showed that there was a significant deterioration in brainstem encoding of speech with ageing. Further, it was also noted that the fundamental frequency of the speech auditory brainstem response correlated with speech perception in noise. Conclusions: It can be concluded from this study that subcortical auditory processing and cognitive measures play a role in cocktail party listening.
... Japanese-speaking Mandarin learners of elementary and intermediate proficiency levels were exposed to utterances in quiet and noisy listening conditions. When recognizing L2 Mandarin speech, the Japanese-speaking Mandarin learners were affected by their L2 proficiency, the semantic context, F0 contours, and noise (Zhang et al., 2016). In another study, students in an introductory Chinese language course were trained to identify tones via three different training types. ...
Article
Full-text available
A growing number of studies on the acquisition of lexical tone by adult learners have revealed that factors such as language background, musical experience, cognitive abilities, and neuroanatomy all play a role in determining tone learning success. On the basis of these findings, it has been argued that the effectiveness of tone learning in adulthood depends on individual differences in these factors. However, it is not clear whether similar individual differences play an analogous role in tone learning in childhood. Indeed, relatively few studies have made comparisons between how adults and children learn lexical tones. Here, we review recent developments for tone learning in both adults and children. The review covers tone training in a range of contexts, including in naive listeners, in native speakers of other tone languages, in listeners with varying levels of musical experience, and in individuals with speech and hearing disorders. Finally, we discuss the parallels between adult and child tone learning, and provide recommendations concerning how findings in adult tone training can provide insights into tone learning for children by accommodating the needs of individual learners.
... In other words, the Chinese listeners with relatively "noisier" FFR pitch representation were also less efficient in vowel classification presumably with a heavier reliance on the higher-order processing based on their lexical tone knowledge. A similar phenomenon is seen in speech-innoise perception comparing native vs. non-native listeners (Zhang et al., 2016a). In the presence of background noise, native listeners generally display advantage in speech comprehension due to accessibility of higher-order linguistic knowledge to compensate for sensory signal degradation, whereas non-native listeners are more susceptible to noise due to insufficient top-down linguistic influence (Bidelman and Dexter, 2015). ...
Article
A current topic in auditory neurophysiology is how brainstem sensory coding contributes to higher-level perceptual, linguistic and cognitive skills. This cross-language study was designed to compare frequency following responses (FFRs) for lexical tones in tonal (Mandarin Chinese) and non-tonal (English) language users and test the correlational strength between FFRs and behavior as a function of language experience. The behavioral measures were obtained in the Garner paradigm to assess how lexical tones might interfere with vowel category and duration judgement. The FFR results replicated previous findings about between-group differences, showing enhanced pitch tracking responses in the Chinese subjects. The behavioral data from the two subject groups showed that lexical tone variation in the vowel stimuli significantly interfered with vowel identification with a greater effect in the Chinese group. Moreover, the FFRs for lexical tone contours were significantly correlated with the behavioral interference only in the Chinese group. This pattern of language-specific association between speech perception and brainstem-level neural phase-locking of linguistic pitch information provides evidence for a possible native language neural commitment at the subcortical level, highlighting the role of experience-dependent brainstem tuning in influencing subsequent linguistic processing in the adult brain.
Article
Despite the growth of research in learning and teaching Chinese as a Foreign Language (CFL), no scoping review of research published in international, anglophone journals has been published so far. A total of 289 journal articles published in 95 journals were identified and used to provide a bibliometric mapping of research in CFL over three decades. Data from the sampled articles reveals a great diversity of focus in CFL research that has been conducted in more than 24 countries. The included articles also reflect an upsurge in research intensity across several key areas of focus, some of which are related to the distinctive linguistic features of the Chinese language. We also report on the research methods employed by the studies in our sample and the characteristics of their participants. Our mapping of the field identifies gaps in the existing literature which may subsequently inform any focused or comprehensive reviews. We conclude by setting out some implications for future CFL research, both in terms of substantive areas of focus and methodological approaches.
Chapter
The perception of acoustic and phonological information in lexical tones is crucial for understanding Chinese words correctly. Research in the past has considered the linguistic functions of both acoustic and phonological information. However, it has been debated whether Chinese lexical tones are processed in the right or the left hemisphere, and whether different types of information may be handled differently in the two hemispheres. For native Chinese speakers (L1), the acoustic information of tones appears to be processed in the right hemisphere, whereas the phonological information of tones is mostly processed in the left hemisphere. For second language (L2) Chinese learners, it has been hypothesized that they may show right-lateralized pattern for processing both acoustic and phonological information at the early stage of Chinese learning; when their processing of these two types of information improves to a higher level at a later stage of Chinese learning, native-like patterns emerge. In this chapter, we discuss how these two types of information play their roles in the processing of lexical tones in Chinese by both native speakers and second language learners of Chinese.
Article
Full-text available
Both long-term native language experience and immediate linguistic expectations can affect listeners' use of acoustic information when making a phonetic decision. In this study, a Garner selective attention task was used to investigate differences in attention to consonants and tones by American English-speaking listeners (N = 20) and Mandarin Chinese-speaking listeners hearing speech in either American English (N = 17) or Mandarin Chinese (N = 20). To minimize the effects of lexical differences and differences in the linguistic status of pitch across the two languages, stimuli and response conditions were selected such that all tokens constitute legitimate words in both languages and all responses required listeners to make decisions that were linguistically meaningful in their native language. Results showed that regardless of ambient language, Chinese listeners processed consonant and tone in a combined manner, consistent with previous research. In contrast, English listeners treated tones and consonants as perceptually separable. Results are discussed in terms of the role of sub-phonemic differences in acoustic cues across language, and the linguistic status of consonants and pitch contours in the two languages.
Article
Full-text available
This study investigated how speech recognition in noise is affected by language proficiency for individual non-native speakers. The recognition of English and Chinese sentences was measured as a function of the signal-to-noise ratio (SNR) in sixty native Chinese speakers who never lived in an English-speaking environment. The recognition score for speech in quiet (which varied from 15% - 92%) was found to be uncorrelated with speech recognition threshold (SRT), i.e. the SNR at which the recognition score drops to 50% of the recognition score in quiet. This result demonstrates separable contributions of language proficiency and auditory processing to speech recognition in noise.
Article
Full-text available
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Article
Full-text available
This study examined the effects of lexical tone contour on the intelligibility of Mandarin sentence in quiet and in noise. A text-to-speech synthesis engine was used to synthesize Mandarin sentences with each word carrying the original lexical tone, flat tone, or a tone randomly selected from the four Mandarin lexical tones. The synthesized speech signals were presented to 11 normal-hearing listeners for recognition in quiet and in speech-shaped noise at 0 dB signal-to-noise ratio. Normal-hearing listeners nearly perfectly recognized the Mandarin sentences produced with modified tone contours in quiet; however, the performance declined substantially in noise. Consistent with previous findings to some extent, the present findings suggest that lexical tones are relatively redundant cues for Mandarin sentence intelligibility in quiet; and other cues could compensate the distorted lexical tone contours. However in noise, the results provide direct evidence that lexical tone contours are important for the recognition of Mandarin sentences.
Article
Full-text available
Flattening the fundamental frequency (F0) contours of Mandarin Chinese sentences reduces their intelligibility in noise but not in quiet. It is unclear, however, how the absence of primary acoustic cue for lexical tones might be compensated with the top-down information of sentence context. In this study, speech intelligibility was evaluated when participants listened to sentences and word lists with or without F0 variations in quiet and noise. The results showed that sentence context partially explained the unchanged intelligibility of monotonous Chinese sentences in quiet and further indicate that F0 variations and sentence context act in concert during speech comprehension.
Article
Full-text available
The current study investigates the learning of nonnative suprasegmental patterns for word identification. Native English-speaking adults learned to use suprasegmentals (pitch patterns) to identify a vocabulary of six English pseudosyllables superimposed with three pitch patterns (18 words). Successful learning of the vocabulary necessarily entailed learning to use pitch patterns in words. Two major facets of sound-to-word learning were investigated: could native speakers of a nontone language learn the use of pitch patterns for lexical identification, and what effect did more basic auditory ability have on learning success. We found that all subjects improved to a certain degree, although large individual differences were observed. Learning success was found to be associated with the learners' ability to perceive pitch patterns in a nonlexical context and their previous musical experience. These results suggest the importance of a phonetic–phonological–lexical continuity in adult nonnative word learning, including phonological awareness and general auditory ability.
Article
Chinese is a tonal language in which variation in pitch is used to distinguish word meanings. Thus, in order to understand a word, listeners have to extract the pitch patterns in addition to its phonemes. Can the correct word meaning still be accessed in sentence contexts if pitch patterns of words are altered? If so, how is this accomplished? The present study attempts to address such questions with event-related functional magnetic resonance imaging (fMRI). Native speakers of Mandarin Chinese listened to normal and pitch-flattened (monotone) speech inside the scanner. The behavioral results indicated that they rated monotone sentences as intelligible as normal sentences, and performed equally well in a dictation test on the two types of sentences. The fMRI results showed that both types of sentences elicited similar activation in the left insular, middle and inferior temporal gyri, but the monotone sentences elicited greater activation in the left planum temporale (PT) compared with normal sentences. These results demonstrate that lexical meaning can still be accessed in pitch-flattened Chinese sentences, and that this process is realized by automatic recovery of the phonological representations of lexical tones from the altered tonal patterns. Our findings suggest that the details of spoken pitch patterns are not essential for adequate lexical-semantic processing during sentence comprehension even in tonal languages like Mandarin Chinese, given that listeners can automatically use additional neural and cognitive resources to recover distorted tonal patterns in sentences.
Article
In 4 classification tasks, requiring attention to 1 dimension (either segmental or tonal) of consonant/vowel (CV) syllables while ignoring the other, 8 native English speakers and 8 native Chinese speakers showed strong perceptual integrality (i.e., interference from orthogonal variation in the unattended dimension). Chinese Ss showed significantly more integrality than English Ss in only 1 task. Chinese and English listeners both show an underlying processing asymmetry between consonants and tones in CV syllables, whereas only Chinese listeners show such an asymmetry between vowels and tones (vowels being more integral with tones than vice versa). (PsycINFO Database Record (c) 2012 APA, all rights reserved)