Conference Paper

Effects of Voice Type and Task on L2 Learners’ Awareness of Pronunciation Errors

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
The aim of our research is to understand how speech learning changes over the life span and to explain why "earlier is better" as far as learning to pronounce a second language (L2) is concerned. An assumption we make is that the phonetic systems used in the production and perception of vowels and consonants remain adaptiive over the life span, and that phonetic systems reorganize in response to sounds encountered in an L2 through the addition of new phonetic categories, or through the modification of old ones. The chapter is organized in the following way. Several general hypotheses concerning the cause of foreign accent in L2 speech production are summarized in the introductory section. In the next section, a model of L2 speech learning that aims to account for age-related changes in L2 pronunciation is presented. The next three sections present summaries of empirical research dealing with the production and perception of L2 vowels, word-initial consonants, and word-final consonants. The final section discusses questions of general theoretical interest, with special attention to a featural (as opposed to a segmental) level of analysis. Although nonsegmental (i.e., prosodic) dimensions are an important source of foreign accent, the present chapter focuses on phoneme-sized units of speech. Although many different languages are learned as an L2, the focus is on the acquisition of English.
Article
Full-text available
For deep learning based speech segregation to have translational significance as a noise-reduction tool, it must perform in a wide variety of acoustic environments. In the current study, performance was examined when target speech was subjected to interference from a single talker and room reverberation. Conditions were compared in which an algorithm was trained to remove both reverberation and interfering speech, or only interfering speech. A recurrent neural network incorporating bidirectional long short-term memory was trained to estimate the ideal ratio mask corresponding to target speech. Substantial intelligibility improvements were found for hearing-impaired (HI) and normal-hearing (NH) listeners across a range of target-to-interferer ratios (TIRs). HI listeners performed better with reverberation removed, whereas NH listeners demonstrated no difference. Algorithm benefit averaged 56 percentage points for the HI listeners at the least-favorable TIR, allowing these listeners to perform numerically better than young NH listeners without processing. The current study highlights the difficulty associated with perceiving speech in reverberant-noisy environments, and it extends the range of environments in which deep learning based speech segregation can be effectively applied. This increasingly wide array of environments includes not only a variety of background noises and interfering speech, but also room reverberation.
Article
Full-text available
This study presents a cross-sectional and longitudinal analysis of how 108 high school students in English-as-a-Foreign-Language (EFL) classrooms enhanced the comprehensibility of their second language (L2) speech according to different motivation, emotion and experience profiles. Overall, the students' learning patterns were primarily associated with their emotional states (anxiety vs. enjoyment), and secondarily with their motivational dispositions (clear vision of ideal future selves). The students' anxiety (together with weaker Ideal L2 Self) negatively related to their performance at the beginning of the project which they had achieved after several years of EFL instruction. Their enjoyment (together with greater Ideal L2 Self) predicted the extent to which they practiced and developed their L2 speech within the time framework of the project-three months. The results suggest that more regular/frequent L2 use with positive emotions directly impacts acquisition, which may in turn lead to the lessening of negative emotions and better L2 proficiency in the long run.
Article
Full-text available
Shadowing has increasingly been recognized as an effective practice for developing listening skills in second language learning. However, there is very little study focusing on learners’ psychological aspects in implementing shadowing practice. The aim of this study is to explore second language learners’ psychological factors, from the motivation framework point of view, in relation to shadowing practice in Japanese as a foreign language context. This study addresses research questions regarding: (1) perceived effectiveness of shadowing; (2) differences in perception depending on the shadowing performance skills; (3) factors that encourage continuing of shadowing; and (4) perceived positive and negative aspects of shadowing. The participants were 36 university students who were enrolled in an advanced Japanese language unit at an Australian university. They were asked to complete a written survey containing 35 questionnaire items and 3 open-ended questions at the end of the study period. The study employs mixed methods, of quantitative and qualitative approaches, to analyze the results and findings. The results indicate that the majority of participants perceive shadowing as effective for both listening and speaking skills, and agree on the usefulness of feedback. However, individual differences were found in how they favor the shadowing speed in relation to their comprehension of the content. The implication of classroom applications is also discussed.
Article
Full-text available
We present a voice morphing strategy that can be used to generate a continuum of accent transformations between a foreign speaker and a native speaker. The approach performs a cepstral decomposition of speech into spectral slope and spectral detail. Accent conversions are then generated by combining the spectral slope of the foreign speaker with a morph of the spectral detail of the native speaker. Spectral morphing is achieved by representing the spectral detail through pulse density modulation and averaging pulses in a pair-wise fashion. The technique is evaluated on parallel recordings from two ARCTIC speakers using objective measures of acoustic quality, speaker identity and foreign accent that have been recently shown to correlate with perceptual results from listening tests.
Chapter
Full-text available
Language experience systematically constrains perception of speech contrasts that deviate phonologically and/or phonetically from those of the listener's native language. These effects are most dramatic in adults, but begin to emerge in infancy and undergo further development through at least early childhood. The central question addressed here is: How do nonnative speech perception findings bear on phonological and phonetic aspects of second language (L2) perceptual learning? A frequent assumption has been that nonnative speech perception can also account for the relative difficulties that late learners have with specific L2 segments and contrasts. However, evaluation of this assumption must take into account the fact that models of nonnative speech perception such as the Perceptual Assimilation Model (PAM) have focused primarily on naive listeners, whereas models of L2 speech acquisition such as the Speech Learning Model (SLM) have focused on experienced listeners. This chapter probes the assumption that L2 perceptual learning is determined by nonnative speech perception principles, by considering the commonalities and complementarities between inexperienced listeners and those learning an L2, as viewed from PAM and SUA. Among the issues examined are how language learning may affect perception of phonetic vs. phonological information, how monolingual vs. multiple language experience may impact perception, and what these may imply for attunement of speech perception to changes in the listener's language environment.
Article
Full-text available
Proposals for task-based approaches to pedagogy have conceded that valid criteria for determining the difficulty level of tasks have yet to be established. We suggest that this is an important area for future research by practicing teachers and we present rationales for three dimensions of task demands which we think may affect the difficulty level of tasks; amount of cognitive load imposed, amount of planning time allowed, and amount of prior information supplied. We briefly report the results of our own studies of speaking, writing and listening tasks which varied along these dimensions, and describe the units of analysis used to • analyse task performance. Finally we discuss the implications of our investigations of task complexity for task-based syllabus design.
Conference Paper
Full-text available
A new method for source information extraction is proposed. The aim of the method is to provide optimal source information for the very high quality speech manipulation system STRAIGHT. The method is based on both time interval and frequency cues, and it provides fundamental frequency and periodicity information within each frequency band, to allow mixed mode excitation. The method is designed to minimize perceptual disturbance due to errors in source information extraction. A preliminary evaluation using a database of simultaneously recorded EGG and speech signals yielded very low gross error rates (0.029% for females and 0.14% for males). In addition, the method is designed so as to minimize the perceptual disturbance caused by any such gross error.
Article
Full-text available
Learners of a second language practice their pronunciation by listening to and imitating utterances from native speakers. Recent research has shown that choosing a well-matched native speaker to imitate can have a positive impact on pronunciation training. Here we propose a voice-transformation technique that can be used to generate the (arguably) ideal voice to imitate: the own voice of the learner with a native accent. Our work extends previous research, which suggests that providing learners with prosodically corrected versions of their utterances can be a suitable form of feedback in computer assisted pronunciation training. Our technique provides a conversion of both prosodic and segmental characteristics by means of a pitch-synchronous decomposition of speech into glottal excitation and spectral envelope. We apply the technique to a corpus containing parallel recordings of foreign-accented and native-accented utterances, and validate the resulting accent conversions through a series of perceptual experiments. Our results indicate that the technique can reduce foreign accentedness without significantly altering the voice quality properties of the foreign speaker. Finally, we propose a pedagogical strategy for integrating accent conversion as a form of behavioral shaping in computer assisted pronunciation training.
Article
The type of voice model used in Computer Assisted Pronunciation Instruction is a crucial factor in the quality of practice and the amount of uptake by language learners. As an example, prior research indicates that second-language learners are more likely to succeed when they imitate a speaker with a voice similar to their own, a so-called “golden speaker”. This manuscript presents Golden Speaker Builder (GSB), a tool that allows learners to generate a personalized “golden-speaker” voice: one that mirrors their own voice but with a native accent. We describe the overall system design, including the web application with its user interface, and the underlying speech analysis/synthesis algorithms. Next, we present results from a series of listening tests, which show that GSB is capable of synthesizing such golden-speaker voices. Finally, we present results from a user study in a language-instruction setting, which show that practising with GSB leads to improved fluency and comprehensibility. We suggest reasons for why learners improved as they did and recommendations for the next iteration of the training.
Article
This study investigated whether second language (L2) speakers are aware of and can manipulate aspects of their speech contributing to comprehensibility. Forty Mandarin speakers of L2 English performed two versions of the same oral task. Before the second task, half of the speakers were asked to make their speech as easy for the interlocutor to understand as possible, while the other half received no additional prompt. Speakers self-assessed comprehensibility after each task and were interviewed about how they improved their comprehensibility. Native-speaking listeners evaluated speaker performances for five dimensions, rating speech similarly across groups and tasks. Overall, participants did not become more comprehensible from task 1 to task 2, whether prompted or not, nor did speakers’ self-assessments become more in line with raters’, indicating speakers may not be aware of their own comprehensibility. However, speakers who did demonstrate greater improvement in comprehensibility received higher ratings of flow, and speakers’ self-ratings of comprehensibility were aligned with listeners’ assessments only in the second task. When discussing comprehensibility, speakers commented more on task content than linguistic dimensions. Results highlight the roles of task repetition and self-assessment in speakers’ awareness of comprehensibility.
Article
This article investigates the effect of explicit individual corrective feedback (ICF) on L2 pronunciation at the micro-level in order to determine whether ICF needs to complement listening only interventions. To this purpose, the authors carried out a study which investigated the immediate effect of feedback on comprehensibility of controlled speech production by L2 learners. 169 adult learners of German were assigned to two groups, one exposed to listening only activities (listening to their own recorded pronunciation and listening to teachers' model pronunciation) and the other receiving ICF in addition to the listening activities. Immediately before and after the respective interventions, the participants read a text, and two experienced judges rated in a blind and randomized rating task whether they could determine differences between the comprehensibility of the pre-test and post-test samples. The results show that ICF was more effective than listening only interventions in improving L2 comprehensibility. The study thus concludes that ICF is a significantly more powerful teaching tool than listening only activities.
Article
Justifies the immediate and vigorous development of computer-based speech training (CBST). A taxonomy distinguishes among 48 possible types of CBST systems in terms of (1) physical source of feedback (FBK), (2) standards of evaluation against which new productions are judged, and (3) amount and type of detail extracted from the productions used to generate FBK, as well as the processing level of the FBK information. Theoretical issues relevant to design are examined. A continuing, large-scale effort is needed to determine optimal forms of FBK for use in CBST systems. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Conference Paper
This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user presents the system with audio snippets of desired search terms to act as the queries. Query and test materials are represented using phonetic posteriorgrams obtained from a phonetic recognition system. Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances. Experiments using this approach are presented using data from the Fisher corpus.
Article
It is generally assumed that second language (L2) learners find it difficult to self-assess their pronunciation skills. In view of the benefits of self-assessment for the language learning process and the need to monitor one’s pronunciation in independent learning environments, we investigated the reliability of self-assessments of pronunciation skills and set out to better understand the causes of difficulties. In our study, 46 advanced learners of German assessed their own articulation of different speech sounds in comparison with the sounds produced by a native speaker. In 85% of all cases the assessments of the raters and the self-assessments were identical. However, the learners only identified half of the number of speech sounds which the raters believed to be inaccurate. The study therefore confirms that even experienced L2 learners seem to find it difficult to self-assess correctly their pronunciation skills. In this paper we have explored a number of reasons for these difficulties in order to identify ways of further improving the self-assessment of L2 pronunciation.
Article
In the past, educators relied on classroom observation to determine the relevance of various pedagogical techniques. Automated language learning now allows us to examine pedagogical questions in a much more rigorous manner. We can use a computer-assisted language learning (CALL) system as a base, tracing all user responses and controlling the information given out. We have thus used the Fluency system [Proceedings of Speech Technology in Language and Learning, 1998, p. 77] to answer the question of what voice a language learner should imitate when working on pronunciation. In this article, we will examine whether there should be a choice of model speakers and what characteristics of a model's voice may be important to match when there is a choice.
Article
In the field of psychology, the practice of p value null-hypothesis testing is as widespread as ever. Despite this popularity, or perhaps because of it, most psychologists are not aware of the statistical peculiarities of the p value procedure. In particular, p values are based on data that were never observed, and these hypothetical data are themselves influenced by subjective intentions. Moreover, p values do not quantify statistical evidence. This article reviews these p value problems and illustrates each problem with concrete examples. The three problems are familiar to statisticians but may be new to psychologists. A practical solution to these p value problems is to adopt a model selection perspective and use the Bayesian information criterion (BIC) for statistical inference (Raftery, 1995). The BIC provides an approximation to a Bayesian hypothesis test, does not require the specification of priors, and can be easily calculated from SPSS output.