Article

Automatic sentence stress feedback for non-native English learners

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

This paper proposes a sentence stress feedback system in which sentence stress prediction, detection, and feedback provision models are combined. This system provides non-native learners with feedback on sentence stress errors so that they can improve their English rhythm and fluency in a self-study setting. The sentence stress feedback system was devised to predict and detect the sentence stress of any practice sentence. The accuracy of the prediction and detection models was 96.6% and 84.1%, respectively. The stress feedback provision model offers positive or negative stress feedback for each spoken word by comparing the probability of the predicted stress pattern with that of the detected stress pattern. In an experiment that evaluated the educational effect of the proposed system incorporated in our CALL system, significant improvements in accentedness and rhythm were seen with the students who trained with our system but not with those in the control group.
Content may be subject to copyright.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Piske et al [2] summarized some of these factors, including age of L2 learning, length of residence in an L2-speaking country, gender, formal instruction, motivation, language learning aptitude and amount of native language. Phonologically, a bulk of research has also examined the influence of segmental and suprasegmental features on the degree of perceived foreign accent [3,4,5,6]. However, interest has mainly focused on segmental deviations from native pronunciation, due to the complexity of prosody and its pedagogical challenges [7]. ...
... However, interest has mainly focused on segmental deviations from native pronunciation, due to the complexity of prosody and its pedagogical challenges [7]. Despite the scarcity in the investigation of perceived accent transferred from suprasegmental factors, it has been argued in some research that prosody also significantly contributes to an overall impression of foreign accent [5]. ...
... Improper sentence stress production is also one primary cause for perceived foreign accent. Studies have corroborated that reduction in sentence stress errors can facilitate non-native learners' production of English rhythm in a more comprehensible and nativelike way [5,8,9]. Nevertheless, many Chinese learners of English suffer from the complexity of English prosody, due to their lack of awareness of the striking differences between their L1 and the target L2. ...
... Sentence stress can form a certain natural stress pattern characteristic for a given language and give emphasis on particular words based on their relative importance. Sentence stress is different from pitch accent that carries pitch prominence caused by an intonation event as well as rhythmic prominence caused by sentence stress [5]. ...
... The standard approach for prosody annotation is based on Tone and Break Indices (ToBI) [25]. As ToBI focuses on pitch accent which is not totally equal to sentence stress, we use the Aix-MARSEC (Aix-Machine Readable Spoken English Corpus) database [26] as in [5]. Aix-MARSEC consists of over 5 hours of BBC radio recordings from 53 different speakers in 11 different speech styles from the 1980s. ...
... We uses the original phrase break annotations for minor and major boundaries which are equivalent with the break indices 3 and 4 in ToBI [27]. We follow the previous work which treated the syllable to be stressed when first appearing in each Jassems narrow rhythm unit (NRU) notation [5]. For practical purposes, we merge minor and major boundaries into break labels. ...
... Sentence stress can form a certain natural stress pattern characteristic for a given language and give emphasis on particular words based on their relative importance. Sentence stress is different from pitch accent that carries pitch prominence caused by an intonation event as well as rhythmic prominence caused by sentence stress [5]. ...
... The standard approach for prosody annotation is based on Tone and Break Indices (ToBI) [25]. As ToBI focuses on pitch accent which is not totally equal to sentence stress, we use the Aix-MARSEC (Aix-Machine Readable Spoken English Corpus) database [26] as in [5]. Aix-MARSEC consists of over 5 hours of BBC radio recordings from 53 different speakers in 11 different speech styles from the 1980s. ...
... We uses the original phrase break annotations for minor and major boundaries which are equivalent with the break indices 3 and 4 in ToBI [27]. We follow the previous work which treated the syllable to be stressed when first appearing in each Jassems narrow rhythm unit (NRU) notation [5]. For practical purposes, we merge minor and major boundaries into break labels. ...
Article
Full-text available
Prosodic event detection plays an important role in spoken language processing tasks and Computer-Assisted Pronunciation Training (CAPT) systems [1]. Traditional methods for the detection of sentence stress and phrase boundaries rely on machine learning methods that model limited contextual information and account little for interaction between these two prosodic events. In this paper, we propose a hierarchical network modeling the contextual factors at the granularity of phoneme, syllable and word based on bidirectional Long Short-Term Memory (BLSTM). Moreover, to account for the inherent connection between sentence stress and phrase boundaries, we perform a joint modeling of these two important prosodic events with a multitask learning framework (MTL) which shares common prosodic features. We evaluate the network performance based on Aix-Machine Readable Spoken English Corpus (Aix-MARSEC). Experimental results show our proposed method obtains the F1-measure of 90% for sentence stress detection and 91% for phrase boundary detection, which outperforms the baseline utilizing conditional random field (CRF) by about 4% and 9% respectively.
... In order to carry out the rhythm analyses, we used the Korean Learners' English Accentuation Corpus (KLEAC), which was made to develop an automatic sentence stress prediction, detection, and feedback system (Lee et al., 2017). This database consists of recordings of 5,500 English sentences read by 75 Korean learners, who were middle school students aged between 13 and 14. ...
... The labelers showed very strong interrater agreement rates (e.g., Fless's κ was .868; see Lee et al., 2017, for further details). ...
... The results of this paper suggest that incorporating our sentence stress metrics into an automatic English speech scoring system may improve its accuracy. Lee et al. (2017) have developed an automatic sentence stress prediction, detection, and feedback system. This system analyzes learners' sentence stress placement using a detection model that is trained using acoustic, lexical, and syntactic features, then compares it with a reference generated by a prediction model, and offers feedback to the learners on the errors that they made. ...
Article
Full-text available
Previous research has suggested that the production of speech rhythm in a second language (L2) or foreign language is influenced by the speaker’s first language rhythm. However, it is less clear how the production of L2 rhythm is affected by the learners’ L2 proficiency, largely due to the lack of rhythm metrics that show consistent results between studies. We examined the production of English rhythm by 75 Korean learners with the rhythm metrics proposed in previous studies (pairwise variability indices and interval measures). We also devised new sentence stress measures (i.e., accentuation rate and accentuation error rate) and investigated whether these new measures can quantify rhythmic differences between the learners. The results found no rhythm metric that significantly correlated with proficiency in the expected direction. In contrast, we found a significant correlation between the learners’ proficiency levels and both measures of sentence stress, showing that less-proficient learners placed sentence stress on more words and made more sentence stress errors. This demonstrates that our measures of sentence stress can be used as effective features for assessing Korean learners’ English rhythm proficiency.
... Prosody and suprasegmental related pronunciation phenomena are important for learning native-like speaking skills. Some research has paid attention to language learners' lexical stress [1,2], intonation [3], and vowel reduction [4]. ...
... We take advantage of the hierarchical attention mechanism. In detail, context vectors, r 1 l and r 2 l from the main and auxiliary encoders are computed in a similar way to Equation (2). The fusion vector containing multi-stream information is obtained as a combination of r 1 l and r 2 l as follows: ...
Article
Full-text available
Vowel reduction is a common pronunciation phenomenon in stress-timed languages like English. Native speakers tend to weaken unstressed vowels into a schwa-like sound. It is an essential factor that makes the accent of language learners sound unnatural. To improve vowel reduction detection in a phoneme recognition framework, we propose an end-to-end vowel reduction detection method that introduces pronunciation prior knowledge as auxiliary information. In particular, we have designed two methods for automatically generating pronunciation prior sequences from reference texts and have implemented a main and auxiliary encoder structure that uses hierarchical attention mechanisms to utilize the pronunciation prior information and acoustic information dynamically. In addition, we also propose a method to realize the feature enhancement after encoding by using the attention mechanism between different streams to obtain expanded multi-streams. Compared with the HMM-DNN hybrid method and the general end-to-end method, the average F1 score of our approach for the two types of vowel reduction detection increased by 8.8% and 6.9%, respectively. The overall phoneme recognition rate increased by 5.8% and 5.0%, respectively. The experimental part further analyzes why the pronunciation prior knowledge auxiliary input is effective and the impact of different pronunciation prior knowledge types on performance.
... If a person wants to be successful in the process of learning a new language, one of the most important qualities they can possess is the capacity for clear and concise communication. Therefore, learning a second language is a way that one can increase their ability to communicate with others (Lee et al., 2017). Because of this, there is a greater requirement to practice helpful skills, such as giving presentations in public settings. ...
Article
Full-text available
The objective of this research is to identify the complexities faced by non-native learners in achieving accuracy and fluency in English language speaking at the BS level. Students studying English as a second or foreign language often encounter challenges in effectively communicating in English. Limited exposure to the language and lack of opportunities for expressive speaking hinder their ability to communicate proficiently. Additionally, educational practices and policies, such as an emphasis on grammar and academic discussions, often neglect the development of practical communication skills. This study utilized a quantitative research approach to collect data from 150 BS English students at Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan. Convenience sampling was employed, and a questionnaire was used as the research instrument. Reliability and validity of the instrument were assessed through statistical analysis. The research findings indicate a significant positive impact of non-native language learners on English speaking skills. Additionally, accuracy and fluency were found to have a strong and positive influence on English speaking skills. Based on the study's findings, it is recommended to prioritize effective teaching strategies that focus on enhancing students' oral communication skills, providing ample opportunities for interpersonal communication and public speaking practice. Curriculum reforms should strike a balance between fluency and accuracy in language teaching to promote effective communication in English among non-native learners.
... Acquisition of early reading dexterity is an integral part of children's education and second language learning. It is an effective source of drawing out meaning from the printed material and a way of foreign and 2 nd language learning (Uchidiuno et al., 2018;Lee et al., 2017;Beltrán-Planques & Querol-Julián, 2018;Schneider, 2019). Griffiths et al. (2016) stated that reading dramatically affects students' educational performance. ...
Article
Full-text available
English being an international language is the need of the time. Reading is one of the essential skills of language learning for non-native English learners. In recent years the concern of reading decline among students has become a point of discussion among educators. In this article, we probe how the quality of English reading is going down gradually among non-native English learning students. For this purpose, we measure the reading abilities of grade 5 students with multiple languages as their mother tongue, having probably not very different socioeconomic backgrounds. The data of students (N=589) were collected through convenience sampling. Students were assessed in three domains of reading (fluency, comprehension, and reading vocabulary) through the self-developed English Reading Achievement Scale (ERAS). The results demonstrated unsatisfactory performance of sampled students' reading in English. Only 56 percent of students remained successful in the assessment of English reading. The performance of girls' school students in reading was comparatively better than the boys' students. Students' performance was probably better in reading fluency and worse in reading vocabulary.
... Prosody of a language consists of intonation, rhythm, and stress. Each component can be expressed in numeric description as a sequence of pitch values, durations, and intensity values of smaller speech segments [12]. Although the values on each segment may be static, a sequential information over a span of segments can be realized when fed to a sequential training architecture, which is mentioned in section 2.2. ...
Conference Paper
Full-text available
Telephonic fraud, or voice phishing, is becoming a major issue in South Korea, and there has been a high demand on AI-assisted solutions in the forensic areas to effectively narrow down the boundary of possible suspects. One such demand is use of automated Korean dialect identification in speaker profiling, which aims to classify dialect candidates of a suspect by analyzing dialectal patterns in speech, which are found in segmental and prosodic parts of the language. In this paper, an ensemble of dialect classifiers is proposed that considers both the segmental and the prosodic speech features. The classifier network is an ensemble of two sub-networks, which are an attention-based bidirectional LSTM network and a vanilla DNN for prosodic and segmental feature learning, respectively. A public dataset of Korean conversational speech is used, and the proposed model shows 61.28% in F-measure, which outperforms the baseline by 25.87%p.
... Minematsu et al. [186] extend this to six-way classification, also distinguishing between sentence stresses marking the beginning and end of a phrase. Lee et al. [160] perform lexical stress detection first, then combine the stress features of the stressed syllable of each word with lexical and syntactic features, specifically its identity w i , a part of speech (POS) tag and class tag (function word vs. content word) obtained from a sentence analyser [35] and the number of vowels and syllables it contains. The combined feature vector for each word is passed, along with the vectors for the two preceding and three following words, through a linear chain Conditional Random Field (CRF) classifier, trained on a stress-annotated corpus, to detect whether each word is sentence-stressed. ...
Thesis
Growing global demand for learning a second language (L2), particularly English, has led to considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications. This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One of the challenges in automatic spoken language assessment is giving candidates feedback on particular aspects, or views, of their spoken language proficiency, in addition to the overall holistic score normally provided. Another is detecting pronunciation and other types of errors at the word or utterance level and feeding them back to the learner in a useful way. It is usually difficult to obtain accurate training data with separate scores for different views and, as examiners are often trained to give holistic grades, single-view scores can suffer issues of consistency. Conversely, holistic scores are available for various standard assessment tasks such as Linguaskill. An investigation is thus conducted into whether assessment scores linked to particular views of the speaker’s ability can be obtained from systems trained using only holistic scores. End-to-end neural systems are designed with structures and forms of input tuned to single views, specifically each of pronunciation, rhythm, intonation and text. By training each system on large quantities of candidate data, individual-view information should be possible to extract. The relationships between the predictions of each system are evaluated to examine whether they are, in fact, extracting different information about the speaker. Three methods of combining the systems to predict holistic score are investigated, namely averaging their predictions and concatenating and attending over their intermediate representations. The combined graders are compared to each other and to baseline approaches. The tasks of error detection and error tendency diagnosis become particularly challenging when the speech in question is spontaneous and particularly given the challenges posed by the inconsistency of human annotation of pronunciation errors. An approach to these tasks is presented by distinguishing between lexical errors, wherein the speaker does not know how a particular word is pronounced, and accent errors, wherein the candidate’s speech exhibits consistent patterns of phone substitution, deletion and insertion. Three annotated corpora x of non-native English speech by speakers of multiple L1s are analysed, the consistency of human annotation investigated and a method presented for detecting individual accent and lexical errors and diagnosing accent error tendencies at the speaker level.
... 10 Journal of Sensors fully understand the pronunciation. The preprocessing of the high-sensitivity acoustic wave sensor includes preemphasis of the English pronunciation error signal, frame and window, and endpoint detection [22]. After the test pronunciation and standard pronunciation are preprocessed, feature extraction and pattern matching calculation are performed. ...
Article
Full-text available
For correction system of English pronunciation errors, the level of correction performance and the reliability, practicability, and adaptability of information feedback are the main basis for evaluating its excellent comprehensive performance. In view of the disadvantages of traditional English pronunciation correction systems, such as failure to timely feedback and correct learners’ pronunciation errors, slow improvement of learners’ English proficiency, and even misleading learners, it is imperative to design a scientific and efficient automatic correction system for English pronunciation errors. High-sensitivity acoustic wave sensors can identify English pronunciation error signal and convert the dimension of collected pronunciation signal according to channel configuration information; acoustic wave sensors can then assist the automatic correction system of English pronunciation errors to filter out interference components in output signal, analyze real-time spectrum, and evaluate the sensitivity of the acoustic wave sensor. Therefore, on the basis of summarizing and analyzing previous research works, this paper expounds the current research status and significance of the design of automatic correction system for English pronunciation errors, elaborates the development background, current status and future challenges of high-sensitivity acoustic wave sensor technology, introduces the methods and principles of time-domain signal amplitude measurement and pronunciation signal preprocessing, carries out the optimization design of pronunciation recognition sensors, performs the improvement design of pronunciation recognition processors, proposes the hardware design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity acoustic wave sensors, analyzes the acquisition program design for English pronunciation errors, implements the parameter extraction of English pronunciation error signal, discusses the software design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity sound wave sensor, and finally, conducts system test and its result analysis. The study results show that the automatic correction system of English pronunciation errors assisted by the high-sensitivity acoustic wave sensors can realize the automatic correction of the amplitude linearity, sensitivity, repeatability error, and return error of English pronunciation errors, which has the robust functions of automatic real-time data collection, processing, saving, query, and retesting. The system can also minimize external interference and improve the accuracy of acoustic wave sensors’ sensitivity calibration, and it provides functions such as reading and saving English pronunciation error signals and visual operation, which effectively improves the ease of use and completeness of the correction system. The study results in this paper provide a reference for the further researches on the automatic correction system design for English pronunciation errors assisted by high-sensitivity acoustic wave sensors. 1. Introduction In the process of learning English, there is a phenomenon that some learners’ spoken language is poor, and as a critical and difficult part of English learning, spoken language has received increasing attention. Therefore, it is imperative to design a scientific and efficient automatic correction system for English pronunciation errors. The traditional English pronunciation correction system cannot provide timely feedback and correction for learners’ pronunciation errors and has disadvantages such as misleading learners and slow improvement of learners’ English proficiency [1]. For the automatic correction system for English pronunciation errors, the level of correction performance and the reliability and practicability of information feedback are the main basis for evaluating its comprehensive performance. The quality of the correction algorithm determines the correction performance, and a reasonable error detection method guarantees [2]. After decomposing and optimizing each subtarget in the multitarget, the high-sensitivity acoustic wave sensor will trade off and coordinate them to make each subtarget. This is because the input information and output information required in the automatic correction of English pronunciation errors are related to the open failure system. The automatic correction system for English pronunciation errors can be divided into two parts: system training and pronunciation correction [3]. The training process of the system is similar to the training in the automatic pronunciation recognition system. The known standard pronunciation information features are extracted and recorded as the standard for pronunciation correction. Pronunciation correction is to correct the pronunciation accuracy of the pronunciation to be tested. The basic process is to extract the features of the pronunciation to be tested, compare its standard pronunciation features, and calculate the score based on the similarity [4]. The high-sensitivity acoustic wave sensor can follow the artificial neural network model, use target tracking to design an automatic correction system, and form an abstract logic layer by combining the characteristics of English pronunciation errors. The similarity between the single target tracking algorithm and the traditional neural network is that they both use a hierarchical structure to construct the logical layer, but the difference is that the three-layer construction mode is the most suitable for automatic correction system [5]. Relying on the optimized design of the pronunciation recognition sensor and the improved design of the pronunciation recognition processor, the software design of the system is completed based on the design of the English pronunciation acquisition program and the extraction of English pronunciation error signal parameters. In this process, although the amount of data is large and the calculations are more complicated, the calculation process of each sentence is the same [6]. It is necessary to use analog-digital signal conversion to improve the data sampling efficiency, and the sampling efficiency is not less than a certain value and the single-target tracking algorithm is used to continuously perform repeated iterative calculations [7]. In pronunciation recognition, a multifrequency oscillator is designed to automatically calibrate the pronunciation accuracy, while the calibration of the circuit conversion is the key to realize the conversion of the English printing information mode. By collecting and controlling the original pronunciation information of the circuit, the accuracy of system’s automatic correction data can be improved [8]. Based on the summary and analysis of previous research results, this paper expounds the current research status and significance of the design of automatic correction system for English pronunciation errors, elaborates the development background, current status, and future challenges of high-sensitivity acoustic wave sensor technology, introduces the methods and principles of time-domain signal amplitude measurement and pronunciation signal preprocessing, carries out the optimization design of pronunciation recognition sensors, performs the improvement design of pronunciation recognition processors, proposes the hardware design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity acoustic wave sensors, analyzes the acquisition program design for English pronunciation errors, implements the parameter extraction of English pronunciation error signal, discusses the software design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity sound wave sensor, and finally, conducts system test and its result analysis. The detailed chapters are arranged as follows: Section 2 introduces the methods and principles of time-domain signal amplitude measurement and pronunciation signal preprocessing; Section 3 proposes the hardware design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity acoustic wave sensors; Section 4 discusses the software design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity sound wave sensor; Section 5 conducts system test and its result analysis; Section 6 is the conclusion. 2. Methods and Principles 2.1. Amplitude Measurement of Time Domain Signal From the perspective of the characteristics of the automatic correction system for English pronunciation errors; the assistance of the high-sensitivity acoustic wave sensor is actually a system function to obtain the required frequency response characteristics, and the same is true for digital filtering. For a linear time-invariant causal simulation system, the relationship between its input and output is where is the input of the system; is the output response of the system; is the continuous time component; is the transfer function of the system; is the number of convolution operators. For the input English phoneme of the system, given the observation vector of each frame of the th segment of pronunciation related to it, calculate its frame-based posterior probability as where is the probability distribution of the observation vector for a given phoneme ; is the prior probability of phoneme ; is the summation function of all text independent phonemes. The design of the high-sensitivity acoustic wave sensor-assisted automatic correction system for English pronunciation errors has passed the first-level calibration to measure the sensitivity of the standard acoustic wave sensor, so the final calculation formula for the sensitivity of the sensor under test is where is the sensitivity of the standard acoustic wave sensor; is the sensitivity of the acoustic wave sensor to be measured; is the amplitude of the acoustic wave sensor to be measured; is the amplitude of the reference acoustic wave sensor. The development of the correction system first recognizes the English pronunciation error signal and then performs dimensional conversion on the collected English pronunciation error signal according to the channel configuration information. Then, the high-sensitivity acoustic wave sensor is embedded in the correction system. The measurement process first filters the pronunciation signal to filter out the interference components in the output signal of the acoustic wave sensor; the system performs real-time spectrum analysis on the filtered English pronunciation error signal and evaluates the sensitivity of acoustic wave sensor [9]. Therefore, the system can minimize external interference and improve the accuracy of sensor sensitivity calibration when there is interference in the on-site environment. In addition, the software provides auxiliary functions such as reading and saving the pronunciation error signal and the operation of the visualization area to improve the ease of use and completeness of the system. The system uses a control signal source and an oscilloscope to complete the task of sending and collecting English pronunciation errors signals. Due to the limitation of the number of measurements, a loop control structure is added to measure the sensitivity of the sensor under test to achieve a certain number of cycles, and the oscilloscope collects signals are added to the program to ensure the integrity of signal reception and finally realize the task of channel triggering and channel reception. 2.2. Pronunciation Signal Preprocessing After the high-sensitivity acoustic wave sensor calculates the ratio of signal input to output, the system function can be obtained by pulling and transforming the comparison value. The acoustic wave sensor is designed by the impulse response method, and the general form of the function for pronunciation error correction is where is the acoustic wave sensor coefficient of the th state at time ; is the cumulative output probability of the th state at time ; is the previous state number of the th state at time ; is the optimal state sequence at time status. The logarithm of the posterior probability of the phoneme in the th segment of pronunciation for each meal of the English pronunciation error signal is taken, and then, the logarithmic posterior probability score of the phoneme under the th segment of pronunciation can be obtained: where is the duration of the th time period corresponding to phone ; is the normalized function of the th time period of phone ; is the likelihood of the segment of the th time period of phone ; is the final output probability. The sound wave sensor-assisted automatic correction system regards English pronunciation errors as a common pronunciation classification problem and uses a classification model to solve this problem. This model is based on a four-layer feed-forward network, which includes a pronunciation vector mapping table; the formula for inputting the input layer vector into the feed-forward network for forward calculation is as follows: where is the network weight; is the bias value; is the activation function; is the output value of the corresponding layer; is the learning rate; is the dimension of the pronunciation error; is the size of the vector table of the pronunciation error. English pronunciation error preprocessing includes sampling of English pronunciation errors, antialiasing band-pass filtering to remove individual pronunciation differences and noise effects caused by equipment and environment. English pronunciation error is an unstable random process, so it needs to use high-sensitivity acoustic wave sensor for short-term processing and involves primitive selection and endpoint detection of pronunciation recognition [10]. Endpoint detection refers to determining the start and end of pronunciation from English pronunciation errors, which is an important part of preprocessing. The process of pronunciation recognition is a process of digitally processing English pronunciation errors. Before processing English pronunciation errors, they must be digitally processed, and this process is analog-to-digital conversion. The analog-to-digital conversion process has to go through two processes, sampling and quantization, to obtain discrete digital signals in time and amplitude, and preemphasis is usually performed before transformation and after antialiasing filtering. After the system obtains learner’s follow-up pronunciation, it extracts its characteristics and calculates the similarity between it and the standard pronunciation in the test question bank and finally maps the similarity to a grade score that is easier for the learner to understand and accept. 3. Hardware Design of Automatic Correction System Based on High-Sensitivity Acoustic Wave Sensors 3.1. Optimization Design of Pronunciation Recognition Sensors In order to ensure the accuracy, reliability, unity, and self-adaptability of English pronunciation errors correction and to adapt to the development trend of automatic correction, the system hardware design must carry out effective measurement supervision on the accuracy and reliability of the sound wave sensor’s measurement value transmission to standardize and perfect the calibration of the sensor. The main components include raster data conditioning module, sensor output conditioning module to be calibrated, and acquisition device and computer system. This module can, respectively, realize the correction of the amplitude linearity, sensitivity, repeatability error, and return error of English pronunciation errors, and has the functions of automatic real-time data collection, data processing, storage, query, and remeasurement. The grating ruler is converted into the corresponding electrical signal through its conditioning circuit, and the corresponding processing is carried out by the formant acquisition card and the English pronunciation signal is input into the system. The sensor to be calibrated outputs the corresponding voltage or current through its conditioning circuit, through data acquisition device, use the interface to achieve serial communication, set the data acquisition device in the reset state, establish the trigger condition, and initialize the control settings; the model can enter the working state, open the serial port, input the output signal into the computer system, and finally, respond with the collected data analysis and processing [11]. Figure 1 shows the automatic correction system design framework for English pronunciation errors assisted by high-sensitivity acoustic wave sensors.
... With the function of stress prediction and detection, learners can clearly notice whether they put the stress in the correct place, and if they realize it in the proper pitch pattern without any delay. The results of this experiment show that learners' accuracy on ryhthm and fluency are improved after training by this system [6]. ...
Conference Paper
Full-text available
Abstract——The present study aims to investigate how multimodal training method contribute to the improvement of the L2 intonation produced by Chinese EFL learners. Altogether 75 learners with an English major background from 3 different dialectal regions of China are recruited. They are divided into 5 groups which differ from each other in training methods, which specifically are the control group (G1), group with sound for training only (G2), group with sound and after�training feedback (G3), group with both audio and visual material for training (G4), and the audiovisual training group with feedback (G5). The results show that although no significant improvement between learners’ pretest and posttest for each group, still we observe that some of the learners in experiment groups score significantly higher in posttest than those in the control group, and among them, G5 is the best as the most cases of intonation are improved through the training. This indicates that multimodal + supervised training method is the most effective way in L2 intonation teaching in this experiment. Unobvious improvement of in the rest cases might due to the limited training time, which will be further ameliorated by a supplementary intensive training in this method.
... For articulated segments of speech (rather than periods of silence), durational measures are actually closely related to F0 and energy. When syllables are stretched or elongated, it is usually the case that the speaker is emphasizing that syllable, and at the same time the syllable is emphasized with changing pitch and/or energy (Lee, Lee, Song, Kim, Kang, Lee, & Hwang, 2017). Thus, lengthening of segmental duration tends to co-occur with changing pitch and energy. ...
Chapter
Full-text available
This chapter discusses the operationalization and scoring of pronunciation constructs using automatic speech recognition (ASR) systems using constrained tasks. It begins by distinguishing between computer-assisted pronunciation training (CAPT) pronunciation remediation systems and ASR pronunciation assessment systems. The chapter describes how the systems are developed and how proficient or native reference speakers can be used as a model against which to compare learner pronunciations. It illustrates how features of speech are extracted and weighted to score sub-constructs of pronunciation such as word sounds, stress, and intonation. The chapter looks ahead to future possible uses of this assessment technology, through the lens of English as an International Language (EIL). Areas where more improvements are needed include the ability to score pronunciation ability on unconstrained, spontaneous speech, versus the read aloud or constrained speech that has been much of the focus of the chapter.
... The complex relation between prominence and linguistic factors (e.g., parts-of-speech, rhythm, discourse meaning) may raise difficulties in learning prosody for Korean learners of English (Im, 2019;Lee et al., 2017;Um et al., 2001, among others). Im (2019) investigated the perception of prominence by Korean learners of English and native English speakers. ...
... For articulated segments of speech (rather than periods of silence), durational measures are actually closely related to F0 and energy. When syllables are stretched or elongated, it is usually the case that the speaker is emphasizing that syllable, and at the same time the syllable is emphasized with changing pitch and/or energy (Lee, Lee, Song, Kim, Kang, Lee, & Hwang, 2017). Thus, lengthening of segmental duration tends to co-occur with changing pitch and energy. ...
Chapter
Full-text available
The paired and group oral assessment formats involve candidates interacting together to perform a task while one or more examiners observe their performances and rate their language proficiency. Keywords: discourse analysis; interaction; assessment
Article
Full-text available
Kim, Rakhun. (2018). A critical review of the impact of the fourth industrial revolution on the development of the basic communicative competence of Korean EFL learners. Multimedia-assisted Language Learning, 21(3), 115-148. The purpose of this study is to theoretically evaluate the impact of the Fourth Industrial Revolution (e.g., machine learning (ML) and deep learning (DL)) on English education in South Korea. Few studies have investigated to what extent ML/DL technologies have instructional potential for Korean English learners' development of English proficiency. To this end, this study deals with the four research issues by extensively reviewing previous literature on computer science and SLA theories. First, this study introduces several well-known concepts and architectures of ML/DL for the following discussions. Second, the study critically examines the opinions of those who claim that recent progress in machine translation can dramatically reduce the need for foreign language learning. Third, this study highlights the significance of basic communicative competence in English learning contexts in South Korea, which-as specified by the components and principles of construction grammar-enables Korean English learners to generate sentence-level utterances without resorting to memorized formulaic expressions. Finally, this study presents three types of English learning applications built upon ML/DL techniques whose validities are evaluated from a perspective of basic communicative competence. As a final remark, this study suggests that ML/DL techniques guided by the principles and components of construction grammar should be applied to Korean English learning contexts as a way to develop their basic communicative competence in English.
Chapter
Modern Computer Assisted Language Learning (CALL) systems use speech recognition to give students the opportunity to build up their spoken language skills through interactive practice with a mechanical partner. Besides the obvious benefits that these systems can offer, e.g. flexible and inexpensive learning, user interaction in this context can often be problematic. In this article, the authors introduce a parallel layer of feedback in a CALL application, which can monitor interaction, report errors and provide advice and suggestions to students. This mechanism combines knowledge accumulated from four different inputs in order to decide on appropriate feedback, which can be customized and adapted in terms of phrasing, style and language. The authors report the results from experiments conducted at six lower secondary classrooms in German-speaking Switzerland with and without this mechanism. After analyzing approximately 13,000 spoken interactions it can be reasonably argued that their parallel feedback mechanism in L2 actually does help students during interaction and contributes as a motivation factor.
Conference Paper
Full-text available
In this paper we discuss about a preliminary study which is one of the series of studies conducted to design a computer software system which helps self-educate the spoken English learners. This software system detects English language syllable stress and uses the results to guide the prospective learners towards a successful learning of spoken English. When learners are learning another language, they confront a number of problems as each language is unique and carries its particular aspects. Learning to speak a new language is more than just learning words, phrases and sentences. The difficulties of a spoken language student become more when an adult student whose vernacular tongue belongs to a syllable-time language such as Mandarin, is making an effort to learn a stress-time spoken language such as English. In a stress-time language, it is the utterances with correct word stress and thereby sentence stress which is usually a problem among adult learners. Using the right sentence stress is the key point in a proper communication. For a learner whose vernacular tongue is syllable-time, making mistakes on this area is very common. Following these patterns, a case study was conducted with 50 final year university undergraduate students’ whose vernacular tongue is syllable-time such as Sinhalese or Tamil. The final year undergraduate students who learn English aiming to spot the difficulties they face in using correct intonation in the process of learning another language. The results of the study indicate that there is a considerable amount of sentence stress problems among the students of spoken English.
Article
Full-text available
Modern Computer Assisted Language Learning (CALL) systems use speech recognition to give students the opportunity to build up their spoken language skills through interactive practice with a mechanical partner. Besides the obvious benefits that these systems can offer, e.g. flexible and inexpensive learning, user interaction in this context can often be problematic. In this article, the authors introduce a parallel layer of feedback in a CALL application, which can monitor interaction, report errors and provide advice and suggestions to students. This mechanism combines knowledge accumulated from four different inputs in order to decide on appropriate feedback, which can be customized and adapted in terms of phrasing, style and language. The authors report the results from experiments conducted at six lower secondary classrooms in German-speaking Switzerland with and without this mechanism. After analyzing approximately 13,000 spoken interactions it can be reasonably argued that their parallel feedback mechanism in L2 actually does help students during interaction and contributes as a motivation factor.
Article
Full-text available
The purpose of this study is to examine native-speaker (NS) and non-native speaker (NNS) comprehensibility and accentedness and identify the factors that may cause listeners to rate speech in certain ways. Think-aloud or vocalization of the thought processes of each speech rater was used to understand what aspects of speech and pronunciation six raters noticed while rating seven and accentedness. We found that there were both similarities and differences between the factors noticed while rating for accentedness and comprehensibility. In addition, the NS and NNS raters showed some major differences in the aspects mentioned during think-aloud.
Article
Full-text available
An automated reading tutor that models and evaluates children's oral reading prosody should also be able to respond dynamically with feedback they like, understand, and benefit from. We describe visual feedback that Project LISTEN's Reading Tutor generates in realtime by mapping prosodic features of children's oral reading to dynamic graphical features of displayed text. We present results from preliminary usability studies of 20 children aged 7-10. We also describe an experiment to test whether such visual feedback elicits oral reading that more closely matches the prosodic contours of adult narrations. Effective feedback on prosody could help children become fluent, expressive readers. Index Terms: prosody, visual feedback, intelligent tutoring systems, children, speech technology for education.
Article
Full-text available
Prosody plays an important role in speech communica-tion between humans. Although several computer-assisted language learning (CALL) systems with utterance evalu-ation function have been developed, the accuracy of their prosody evaluation is still poor. In the present paper, we develop new methods by which to evaluate the rhythm and intonation of English sentences uttered by Japanese learners. The novel features of our study are as follows: (1) new prosodic features are added to traditional features, and (2) word importance factors are introduced in the calculation of intonation score. The word importance factor is automatically estimated using the ordinary least squares method and is optimized based on word clusters generated by a decision tree. Experiments conducted herein reveal the correlation co-efficient (±1.0 denotes the best correlation) between the rhythm score given by native speakers and the system was −0.55. In contrast, a conventional feature (pause insertion error rate) gave a correlation coefficient of only −0.11. The correlation coefficient between the intonation scores given by native speakers and the system was only −0.29. How-ever, the word importance factor with decision tree clus-tering improved the correlation coefficient to 0.45. In addition, we propose a method of integrating the rhythm score with the intonation score, which improved the correlation coefficient from 0.45 to 0.48 for evaluating intonation.
Article
Full-text available
While the current TTS systems can deliver quite acceptable segmental quality of synthesized speech for voice user interface applications, its prosody is still perceived by users as “robotic” or not expressive. In this paper, we investigate how to improve TTS prosody prediction and detection. Conditional Random Field (CRF), a discriminative probabilistic model for the labeling the sequential data, is adopted. Rich syntactic and acoustic, contextual features are used in building the CRF models. Experiments performed on Boston University Radio Speech Corpus show that CRF models trained on our proposed rich contextual features can improve the accuracy of prosody prediction and detection in both speaker-dependent and speaker-independent cases. The performance is either comparable or better than the best reported results.
Article
Full-text available
We evaluate two types of prosodic features utilizing automatically generated stress and tone labels for non-native read speech in terms of their applicability for automated speech scoring. Both types of features have not been used in the context of automated scoring of non-native read speech to date. In our first experiment, we compute features based on a positional match between automatically identified stress and tone labels for 741 non-native read text passages with a human gold standard on the same texts read by a native speaker. Pearson correlations of up to r=0.54 between these features and human proficiency scores are observed. In our second experiment, we use stress and tone labels of the same non-native read speech corpus to compute derived features of rhythm and relative frequencies, which then again are correlated with human proficiency scores. Pearson correlations of up to r=-0.38 are observed.
Article
Full-text available
Identified changes in 48 nonnative speakers' (NNSs) pronunciation over a period of 12 weeks as a result of the type of instruction they received--global, segmental, and no specific pronunciation instruction. Implications for pronunciation instruction are drawn from the results. (Author/VWL)
Article
Full-text available
We recorded non-native English productions of 55 speakers; a subset of these productions was assessed by 60 native English speakers as for their quality w. r. t. intelligibility, rhythm, etc. Applying multiple linear regression on a large prosodic feature vector – modelling approaches known from the literature as well as generic prosody – we can automatically predict the listener's assessments with correlations of up to .85. We discuss most important features and limitations of this approach.
Conference Paper
Full-text available
Speech rhythm measurements have been used in a limited number of previous studies on automated speech assessment, an approach using speech recognition technology to judge non-native speakers' proficiency levels. However, one of the most problematic issues of these previous studies is a lack of a comparison of these rhythm features with other effective non-rhythm features found in decade-long previous research. In this paper, we extracted both non-rhythm and rhythm features and compared them with respect to their performances to predict proficiency scores rated by humans. We show that adding rhythm features significantly improves the performance of the scoring model based only on non-rhythm features.
Conference Paper
Full-text available
Predicting the degree of nativeness of a student utterance is an important issue in computer-aided language learning. This task has been addressed by many studies focusing on the segmental assessment of the speech signal. To achieve improved correlations between human and automatic nativeness scores, other aspects of speech should also be considered, such as prosody. The goal of this study is to evaluate the use of prosodic information to help predict the degree of nativeness of pronunciation, independent of the text. A supervised strategy based on human grades is used in an attempt to select promising features for this task. Preliminary results show improvements in the corre- lation between human and automatic scores.
Conference Paper
Full-text available
Automatic prosodic event detection is important for both speech understanding and natural speech synthesis since prosody provides additional information over the short-term segmental features and lexical representation of an utterance. Similar to previous work, this paper focuses on automatic detection of coarse level representation of pitch accents, intonational phrase boundaries (IPB), and break indices. We exploit various classifiers and identify effective feature sets to improve performance of prosodic event detection according to acoustic, lexical, and syntactic evidence. our experiments on the Boston University Radio News Corpus show that the neural network classifier achieves the best performance for modeling acoustic evidence, and that support vector machines are more effective for the lexical and syntactic evidence. The combination of the acoustic and the syntactic models yields 89.8% accent detection accuracy, 93.3% IPB detection accuracy, and 91.1% break index detection accuracy. Compared with previous work, the IPB performance is similar, whereas the results for accent and break index detection are significantly better.
Article
Full-text available
A field of spoken dialog systems is a rapidly growing research area because the performance improvement of speech technologies motivates the possibility of building systems that a human can easily operate in order to access useful information via spoken languages. Among the components in a spoken dialog system, the dialog management plays major roles such as discourse analysis, database access, error handling, and system action prediction. This survey covers design issues and recent approaches to the dialog management techniques for modeling the dialogs. We also explain the user simulation techniques for automatic evaluation of spoken dialog systems.
Article
Full-text available
This paper describes techniques of scoring prosodic proficiency of English sentences spoken by Japanese. The multiple regression model predicts the prosodic proficiency using new prosodic measures based on the characteristics of Japanese novice learners of English. Prosodic measures are calculated by comparing prosodic parameters, such as F 0 , power and duration, of learner's and native speaker's speech. The new measures include the approximation error of the fitting line and the comparison result of prosodic parameters for a limited segment of the word boundary rather than the whole utterance. This paper reveals that the introduction of the new measures improved the correlation by 0.1 between the teachers' and automatic scores.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Article
Native speakers of Japanese learning English generally have difficulty differentiating the phonemes /r/ and /l/, even after years of experience with English. Previous research that attempted to train Japanese listeners to distinguish this contrast using synthetic stimuli reported little success, especially when transfer to natural tokens containing /r/ and /l/ was tested. In the present study, a different training procedure that emphasized variability among stimulus tokens was used. Japanese subjects were trained in a minimal pair identification paradigm using multiple natural exemplars contrasting /r/ and /l/ from a variety of phonetic environments as stimuli. A pretest‐posttest design containing natural tokens was used to assess the effects of training. Results from six subjects showed that the new procedure was more robust than earlier training techniques. Small but reliable differences in performance were obtained between pretest and posttest scores. The results demonstrate the importance of stimulus variability and task‐related factors in training nonnative speakers to perceive novel phoneticcontrasts that are not distinctive in their native language.
Article
In the United States, the Communicative Approach has been the focus of much intellectual debate resulting in numerous studies examining the acquisition of the four language skills. Although the acquisition of certain morphological structures and discourse strategies have received attention, studies on the acquisition of target language pronunciation have lagged behind. Recent research examining phonological instruction indicates that improvement in pronunciation for adult foreign language learners is possible by employing a multimodal methodology designed to account for individual learning style variation. An extension of this research examines experimental subjects' overall improvement in pronunciation accuracy, pinpoints specific areas where pronunciation instruction appears to be most beneficial (e.g., discrete-word repetition, sentence repetition, discrete-word reading, and free speech); and determines natural phoneme classes and specific allophones that improved as a result of phonological instruction. The findings have implications for current communicative approaches.
Article
One of the chief goals of most second language learners is to be understood in their second language by a wide range of interlocutors in a variety of contexts. Although a nonnative accent can sometimes interfere with this goal, prior to the publication of this study, second language researchers and teachers alike were aware that an accent itself does not necessarily act as a communicative barrier. Nonetheless, there had been very little empirical investigation of how the presence of a nonnative accent affects intelligibility, and the notions of “heavy accent” and “low intelligibility” had often been confounded. Some of the key findings of the study—that even heavily accented speech is sometimes perfectly intelligible and that prosodic errors appear to be a more potent force in the loss of intelligibility than phonetic errors—added support to some common, but weakly substantiated beliefs. The study also provided a framework for a program of research to evaluate the ways in which such factors as intelligibility and comprehensibility are related to a number of other dimensions. The authors have extended and replicated the work begun in this study to include learners representing other L1 backgrounds (Cantonese, Japanese, Polish, Spanish) and different levels of learner proficiency, as well as other discourse types (Derwing & Munro, 1997; Munro & Derwing, 1995). Further support for the notion that accent itself should be regarded as a secondary concern was obtained in a study of processing difficulty (Munro & Derwing, 1995), which revealed that nonnative utterances tend to require more time to process than native-produced speech, but failed to indicate a relationship between strength of accent and processing time.The approach to L2 speech evaluation used in this study has also proved useful in investigations of the benefits of different methods of teaching of pronunciation to ESL learners. In particular, it is now clear that learner assessments are best carried out with attention to the multidimensional nature of L2 speech, rather than with a simple focus on global accentedness. It has been shown, for instance, that some pedagogical methods may be effective in improving intelligibility while others may have an effect only on accentedness (Derwing, Munro, & Wiebe, 1998).
Conference Paper
To improve the English proficiency of Korean learners, we design a system for pitch accents, which consists of prediction, detection and feedback parts. The prediction and detection parts adopt Conditional Random Field models to achieve a prediction accuracy of 87.25%, which is based on the Boston University radio news corpus, and a detection accuracy of 81.21%, which is based on the Korean Learner's English Accentuation corpus. In the learner experiment with our system, learners' pitch accent proficiency, as assessed by English experts, was improved from 2.67 to 3.25 on a scale of 1-to-5, and the accuracy of not-wrong feedback was measured at 82.77%. The learners assessed the learning effectiveness of our system at 4.3 on a scale of 1-to-5.
Conference Paper
To solve the problem that the reflective intensity modulated fiber optic displacement sensor is easily influenced by luminous power and external vibrations, this paper firstly introduces the theory of the reflective intensity modulated fiber displacement sensor. Then a displacement sensor of two-circle reflective coaxial fiber is presented, and the compensating-characteristics of the sensor are investigated, respectively. Much more, the method that uses circuit to achieve the compensation is put forward and the circuit is designed. At last, the stability of the fiber sensor and circuit which provide a new method for the compensation has been validated. The result shows the displacement sensor of two-circle coaxial fiber can eliminate the effects caused by the vibration of the fiber so that it has good compensation property.
Article
This Study examined native English speakers' reactions to nonnative primary stress in English discourse. I measured North American undergraduate Students' processing, comprehension, and evaluations of three versions of an international teaching assistant's speech: with primary stress correctly placed, incorrectly placed, or missing entirely Results indicated that when listening to speech with correct primary stress, the participants recalled significantly more content and evaluated the speaker significantly more favorably than when primary stress was aberrant or missing. Listeners also tended to process discourse more easily when primary stress was correct, but the result was not significant. These findings provide insights into how using primary stress affects international TAs' intelligibility. They also provide empirical support and suggest new ideas for current pedagogical practices that emphasize suprasegmentals ill teaching pronunciation.
Article
The supposition that French is "syllable-timed" is examined and found, in the light of instrumental and perceptual evidence, to deny just those facts that make it possible to speak and understand the language. An attempt is made to discover those features of linguists' conceptual and perceptual systems that conspire to straitjacket standard French into such a framework. (66 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This study introduces the educational assistant robots that we developed for foreign language learning and explores the effectiveness of robot-assisted language learning (RALL) which is in its early stages. To achieve this purpose, a course was designed in which students have meaningful interactions with intelligent robots in an immersive environment. A total of 24 elementary students, ranging in age from ten to twelve, were enrolled in English lessons. A pre-test/post-test design was used to investigate the cognitive effects of the RALL approach on the students’ oral skills. No significant difference in the listening skill was found, but the speaking skills improved with a large effect size at the significance level of 0.01. Descriptive statistics and the pre-test/post-test design were used to investigate the affective effects of RALL approach. The result showed that RALL promoted and improved students’ satisfaction, interest, confidence, and motivation at the significance level of 0.01.
Article
This study investigated the relationship between experienced SPEAK Test raters' judgments of nonnative pronunciation and actual deviance in segmentals, prosody, and syllable structure. Sixty reading passage speech samples from SPEAK Test tapes of speakers from 11 language groups were rated impressionistically on pronunciation and later analyzed for deviance in segmentals, prosody, and syllable structure. The deviance found in each area of pronunciation was then correlated with the pronunciation ratings using Pearson correlations and multiple regression. An analysis of the 60 speakers showed that whereas deviance in segmentals, prosody, and syllable structure all showed a significant influence on the pronunciation ratings, the prosodic variable proved to have the strongest effect. When separate analyses were done on two language subgroups within the sample, prosody was always found to be significantly related to the global ratings, whereas this was not always true for the other variables investigated.
Article
We had native English-speaking (NS) listeners evaluate the effects of 3types of instruction (segmental accuracy; general speaking habits and prosodic factors; and nospecific pronunciation instruction) on the speech of 3 groups of English as a second language(ESL) learners. We recorded their sentences and extemporaneously produced narratives at thebeginning and end of a 12-week course of instruction. In a blind rating task, 48 native Englishlisteners judged randomized sentences for accentedness and comprehensibility. Six experiencedESL teachers evaluated narratives for accent, comprehensibility, and fluency. Although bothgroups instructed in pronunciation showed significant improvement in comprehensibility andaccentedness on the sentences, only the global group showed improvement in comprehensibilityand fluency in the narratives. We argue that the focus of instruction and the attentional demandson speakers and listeners account for these findings.
Conference Paper
We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Article
This study examines the interrelationships among accentedness, perceived comprehensibility, and intelligi bility in the speech of L2 learners. Eighteen native speak ers (NSs) of English listened to excerpts of extemporaneous English speech produced by 10 Mandarin NSs and two English NSs. We asked the listeners to transcribe the utterances in standard orthography and to rate them for degree of foreign-accentedness and comprehensibility on 9- point scales. We assigned the transcriptions intelligibility scores on the basis of exact word matches. Although the utterances tended to be highly intelligible and highly rated for comprehensibility, the accent judgment scores ranged widely, with a noteworthy proportion of scores at the “heavily-accented” end of the scale. We calculated Pearson correlations for each listener's intelligibility, accentedness, and comprehensibility scores and the phonetic, phonemic, and grammatical errors in the stimuli, as well as goodness of intonation ratings. Most listeners showed significant correlations between accentedness and errors, fewer lis teners showed correlations between accentedness and per ceived comprehensibility, and fewer still showed a rela tionship between accentedness and intelligibility. The findings suggest that although strength of foreign accent is correlated with perceived comprehensibility and intelligibility, a strong foreign accent does not necessarily reduce the comprehensibility or intelligibility of L2 speech.
Conference Paper
Prosody can be used to infer whether or not candidates fully understand a passage they are reading aloud. In this paper, we focused on automatic assessment of prosody in a read-aloud section for a high-stakes English test. A new method was proposed to handle fundamental frequency (F0) of unvoiced segments that significantly improved the predictive power of F0. The kmeans clustering method was used to build canonical contour models at the word level for F0 and energy. A direct comparison between the candidate's contours and ideal contours gave a strong prediction of the candidate's human prosody rating. Duration information at the phoneme level was an even better predictive feature. When the contours and duration information were combined, the correlation coefficient r = 0.80 was obtained, which exceeded the correlation between human raters (r = 0.75). The results support the use of the new methods for evaluating prosody in high-stakes assessments.
Conference Paper
In this paper, we proposed a novel method for evaluating intonation of an English utterance spoken by a learner for intonation learning by a CALL system. The proposed method is based on an intonation evaluation method proposed by Suzuki et al., which uses "word importance factors," which are calculated based on word clusters given by a decision tree. We extended Suzuki's method so that multiple decision trees are used and the resulting intonation scores are combined using multiple regression. As a result of an experiment, we obtained correlation coefficient comparable to the correlation between human raters.
Conference Paper
We describe an automated method to assess the expressiveness of children's oral reading by measuring how well its prosodic contours correlate in pitch, intensity, pauses, and word reading times with adult narrations of the same sentences. We evaluate the method directly against a common rubric used to assess fluency by hand. We also compare it against manual and automated baselines by its ability to predict fluency and comprehension test scores and gains of 55 children ages 7-10 who used Project LISTEN's Reading Tutor. It outperforms the human-scored rubric, predicts gains, and could help teachers identify which students are making adequate progress.
Article
Reimpresiones de 1968, 1970-73
Article
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Linguistics and Philosophy, 1980. MICROFICHE COPY AVAILABLE IN ARCHIVES AND HUMANITIES. Bibliography: leaves 246-253. by Janet Breckenridge Pierrehumbert. Ph.D.
Article
Native speakers of Japanese learning English generally have difficulty differentiating the phonemes /r/ and /l/, even after years of experience with English. Previous research that attempted to train Japanese listeners to distinguish this contrast using synthetic stimuli reported little success, especially when transfer to natural tokens containing /r/ and /l/ was tested. In the present study, a different training procedure that emphasized variability among stimulus tokens was used. Japanese subjects were trained in a minimal pair identification paradigm using multiple natural exemplars contrasting /r/ and /l/ from a variety of phonetic environments as stimuli. A pretest-posttest design containing natural tokens was used to assess the effects of training. Results from six subjects showed that the new procedure was more robust than earlier training techniques. Small but reliable differences in performance were obtained between pretest and posttest scores. The results demonstrate the importance of stimulus variability and task-related factors in training nonnative speakers to perceive novel phonetic contrasts that are not distinctive in their native language.
Article
Two experiments were carried out to extend Logan et al.'s recent study [J. S. Logan, S. E. Lively, and D. B. Pisoni, J. Acoust. Soc. Am. 89, 874-886 (1991)] on training Japanese listeners to identify English /r/ and /l/. Subjects in experiment 1 were trained in an identification task with multiple talkers who produced English words containing the /r/-/l/ contrast in initial singleton, initial consonant clusters, and intervocalic positions. Moderate, but significant, increases in accuracy and decreases in response latency were observed between pretest and posttest and during training sessions. Subjects also generalized to new words produced by a familiar talker and novel words produced by an unfamiliar talker. In experiment 2, a new group of subjects was trained with tokens from a single talker who produced words containing the /r/-/l/ contrast in five phonetic environments. Although subjects improved during training and showed increases in pretest-posttest performance, they failed to generalize to tokens produced by a new talker. The results of the present experiments suggest that variability plays an important role in perceptual learning and robust category formation. During training, listeners develop talker-specific, context-dependent representations for new phonetic categories by selectively shifting attention toward the contrastive dimensions of the non-native phonetic categories. Phonotactic constraints in the native language, similarity of the new contrast to distinctions in the native language, and the distinctiveness of contrastive cues all appear to mediate category acquisition.