Figure - available via license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Content may be subject to copyright.
Source publication
This paper proposes a sentence stress feedback system in which sentence stress prediction, detection, and feedback provision models are combined. This system provides non-native learners with feedback on sentence stress errors so that they can improve their English rhythm and fluency in a self-study setting. The sentence stress feedback system was...
Citations
... If a person wants to be successful in the process of learning a new language, one of the most important qualities they can possess is the capacity for clear and concise communication. Therefore, learning a second language is a way that one can increase their ability to communicate with others (Lee et al., 2017). Because of this, there is a greater requirement to practice helpful skills, such as giving presentations in public settings. ...
The objective of this research is to identify the complexities faced by non-native learners in achieving accuracy and fluency in English language speaking at the BS level. Students studying English as a second or foreign language often encounter challenges in effectively communicating in English. Limited exposure to the language and lack of opportunities for expressive speaking hinder their ability to communicate proficiently. Additionally, educational practices and policies, such as an emphasis on grammar and academic discussions, often neglect the development of practical communication skills. This study utilized a quantitative research approach to collect data from 150 BS English students at Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan. Convenience sampling was employed, and a questionnaire was used as the research instrument. Reliability and validity of the instrument were assessed through statistical analysis. The research findings indicate a significant positive impact of non-native language learners on English speaking skills. Additionally, accuracy and fluency were found to have a strong and positive influence on English speaking skills. Based on the study's findings, it is recommended to prioritize effective teaching strategies that focus on enhancing students' oral communication skills, providing ample opportunities for interpersonal communication and public speaking practice. Curriculum reforms should strike a balance between fluency and accuracy in language teaching to promote effective communication in English among non-native learners.
... Prosody of a language consists of intonation, rhythm, and stress. Each component can be expressed in numeric description as a sequence of pitch values, durations, and intensity values of smaller speech segments [12]. Although the values on each segment may be static, a sequential information over a span of segments can be realized when fed to a sequential training architecture, which is mentioned in section 2.2. ...
Telephonic fraud, or voice phishing, is becoming a major issue in South Korea, and there has been a high demand on AI-assisted solutions in the forensic areas to effectively narrow down the boundary of possible suspects. One such demand is use of automated Korean dialect identification in speaker profiling, which aims to classify dialect candidates of a suspect by analyzing dialectal patterns in speech, which are found in segmental and prosodic parts of the language. In this paper, an ensemble of dialect classifiers is proposed that considers both the segmental and the prosodic speech features. The classifier network is an ensemble of two sub-networks, which are an attention-based bidirectional LSTM network and a vanilla DNN for prosodic and segmental feature learning, respectively. A public dataset of Korean conversational speech is used, and the proposed model shows 61.28% in F-measure, which outperforms the baseline by 25.87%p.
... Acquisition of early reading dexterity is an integral part of children's education and second language learning. It is an effective source of drawing out meaning from the printed material and a way of foreign and 2 nd language learning (Uchidiuno et al., 2018;Lee et al., 2017;Beltrán-Planques & Querol-Julián, 2018;Schneider, 2019). Griffiths et al. (2016) stated that reading dramatically affects students' educational performance. ...
English being an international language is the need of the time. Reading is one of the essential skills of language learning for non-native English learners. In recent years the concern of reading decline among students has become a point of discussion among educators. In this article, we probe how the quality of English reading is going down gradually among non-native English learning students. For this purpose, we measure the reading abilities of grade 5 students with multiple languages as their mother tongue, having probably not very different socioeconomic backgrounds. The data of students (N=589) were collected through convenience sampling. Students were assessed in three domains of reading (fluency, comprehension, and reading vocabulary) through the self-developed English Reading Achievement Scale (ERAS). The results demonstrated unsatisfactory performance of sampled students' reading in English. Only 56 percent of students remained successful in the assessment of English reading. The performance of girls' school students in reading was comparatively better than the boys' students. Students' performance was probably better in reading fluency and worse in reading vocabulary.
... 10 Journal of Sensors fully understand the pronunciation. The preprocessing of the high-sensitivity acoustic wave sensor includes preemphasis of the English pronunciation error signal, frame and window, and endpoint detection [22]. After the test pronunciation and standard pronunciation are preprocessed, feature extraction and pattern matching calculation are performed. ...
For correction system of English pronunciation errors, the level of correction performance and the reliability, practicability, and adaptability of information feedback are the main basis for evaluating its excellent comprehensive performance. In view of the disadvantages of traditional English pronunciation correction systems, such as failure to timely feedback and correct learners’ pronunciation errors, slow improvement of learners’ English proficiency, and even misleading learners, it is imperative to design a scientific and efficient automatic correction system for English pronunciation errors. High-sensitivity acoustic wave sensors can identify English pronunciation error signal and convert the dimension of collected pronunciation signal according to channel configuration information; acoustic wave sensors can then assist the automatic correction system of English pronunciation errors to filter out interference components in output signal, analyze real-time spectrum, and evaluate the sensitivity of the acoustic wave sensor. Therefore, on the basis of summarizing and analyzing previous research works, this paper expounds the current research status and significance of the design of automatic correction system for English pronunciation errors, elaborates the development background, current status and future challenges of high-sensitivity acoustic wave sensor technology, introduces the methods and principles of time-domain signal amplitude measurement and pronunciation signal preprocessing, carries out the optimization design of pronunciation recognition sensors, performs the improvement design of pronunciation recognition processors, proposes the hardware design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity acoustic wave sensors, analyzes the acquisition program design for English pronunciation errors, implements the parameter extraction of English pronunciation error signal, discusses the software design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity sound wave sensor, and finally, conducts system test and its result analysis. The study results show that the automatic correction system of English pronunciation errors assisted by the high-sensitivity acoustic wave sensors can realize the automatic correction of the amplitude linearity, sensitivity, repeatability error, and return error of English pronunciation errors, which has the robust functions of automatic real-time data collection, processing, saving, query, and retesting. The system can also minimize external interference and improve the accuracy of acoustic wave sensors’ sensitivity calibration, and it provides functions such as reading and saving English pronunciation error signals and visual operation, which effectively improves the ease of use and completeness of the correction system. The study results in this paper provide a reference for the further researches on the automatic correction system design for English pronunciation errors assisted by high-sensitivity acoustic wave sensors.
1. Introduction
In the process of learning English, there is a phenomenon that some learners’ spoken language is poor, and as a critical and difficult part of English learning, spoken language has received increasing attention. Therefore, it is imperative to design a scientific and efficient automatic correction system for English pronunciation errors. The traditional English pronunciation correction system cannot provide timely feedback and correction for learners’ pronunciation errors and has disadvantages such as misleading learners and slow improvement of learners’ English proficiency [1]. For the automatic correction system for English pronunciation errors, the level of correction performance and the reliability and practicability of information feedback are the main basis for evaluating its comprehensive performance. The quality of the correction algorithm determines the correction performance, and a reasonable error detection method guarantees [2]. After decomposing and optimizing each subtarget in the multitarget, the high-sensitivity acoustic wave sensor will trade off and coordinate them to make each subtarget. This is because the input information and output information required in the automatic correction of English pronunciation errors are related to the open failure system. The automatic correction system for English pronunciation errors can be divided into two parts: system training and pronunciation correction [3]. The training process of the system is similar to the training in the automatic pronunciation recognition system. The known standard pronunciation information features are extracted and recorded as the standard for pronunciation correction. Pronunciation correction is to correct the pronunciation accuracy of the pronunciation to be tested. The basic process is to extract the features of the pronunciation to be tested, compare its standard pronunciation features, and calculate the score based on the similarity [4].
The high-sensitivity acoustic wave sensor can follow the artificial neural network model, use target tracking to design an automatic correction system, and form an abstract logic layer by combining the characteristics of English pronunciation errors. The similarity between the single target tracking algorithm and the traditional neural network is that they both use a hierarchical structure to construct the logical layer, but the difference is that the three-layer construction mode is the most suitable for automatic correction system [5]. Relying on the optimized design of the pronunciation recognition sensor and the improved design of the pronunciation recognition processor, the software design of the system is completed based on the design of the English pronunciation acquisition program and the extraction of English pronunciation error signal parameters. In this process, although the amount of data is large and the calculations are more complicated, the calculation process of each sentence is the same [6]. It is necessary to use analog-digital signal conversion to improve the data sampling efficiency, and the sampling efficiency is not less than a certain value and the single-target tracking algorithm is used to continuously perform repeated iterative calculations [7]. In pronunciation recognition, a multifrequency oscillator is designed to automatically calibrate the pronunciation accuracy, while the calibration of the circuit conversion is the key to realize the conversion of the English printing information mode. By collecting and controlling the original pronunciation information of the circuit, the accuracy of system’s automatic correction data can be improved [8].
Based on the summary and analysis of previous research results, this paper expounds the current research status and significance of the design of automatic correction system for English pronunciation errors, elaborates the development background, current status, and future challenges of high-sensitivity acoustic wave sensor technology, introduces the methods and principles of time-domain signal amplitude measurement and pronunciation signal preprocessing, carries out the optimization design of pronunciation recognition sensors, performs the improvement design of pronunciation recognition processors, proposes the hardware design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity acoustic wave sensors, analyzes the acquisition program design for English pronunciation errors, implements the parameter extraction of English pronunciation error signal, discusses the software design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity sound wave sensor, and finally, conducts system test and its result analysis. The detailed chapters are arranged as follows: Section 2 introduces the methods and principles of time-domain signal amplitude measurement and pronunciation signal preprocessing; Section 3 proposes the hardware design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity acoustic wave sensors; Section 4 discusses the software design of automatic correction system for English pronunciation errors based on the assistance of high-sensitivity sound wave sensor; Section 5 conducts system test and its result analysis; Section 6 is the conclusion.
2. Methods and Principles
2.1. Amplitude Measurement of Time Domain Signal
From the perspective of the characteristics of the automatic correction system for English pronunciation errors; the assistance of the high-sensitivity acoustic wave sensor is actually a system function to obtain the required frequency response characteristics, and the same is true for digital filtering. For a linear time-invariant causal simulation system, the relationship between its input and output is where is the input of the system; is the output response of the system; is the continuous time component; is the transfer function of the system; is the number of convolution operators.
For the input English phoneme of the system, given the observation vector of each frame of the th segment of pronunciation related to it, calculate its frame-based posterior probability as where is the probability distribution of the observation vector for a given phoneme ; is the prior probability of phoneme ; is the summation function of all text independent phonemes.
The design of the high-sensitivity acoustic wave sensor-assisted automatic correction system for English pronunciation errors has passed the first-level calibration to measure the sensitivity of the standard acoustic wave sensor, so the final calculation formula for the sensitivity of the sensor under test is where is the sensitivity of the standard acoustic wave sensor; is the sensitivity of the acoustic wave sensor to be measured; is the amplitude of the acoustic wave sensor to be measured; is the amplitude of the reference acoustic wave sensor.
The development of the correction system first recognizes the English pronunciation error signal and then performs dimensional conversion on the collected English pronunciation error signal according to the channel configuration information. Then, the high-sensitivity acoustic wave sensor is embedded in the correction system. The measurement process first filters the pronunciation signal to filter out the interference components in the output signal of the acoustic wave sensor; the system performs real-time spectrum analysis on the filtered English pronunciation error signal and evaluates the sensitivity of acoustic wave sensor [9]. Therefore, the system can minimize external interference and improve the accuracy of sensor sensitivity calibration when there is interference in the on-site environment. In addition, the software provides auxiliary functions such as reading and saving the pronunciation error signal and the operation of the visualization area to improve the ease of use and completeness of the system. The system uses a control signal source and an oscilloscope to complete the task of sending and collecting English pronunciation errors signals. Due to the limitation of the number of measurements, a loop control structure is added to measure the sensitivity of the sensor under test to achieve a certain number of cycles, and the oscilloscope collects signals are added to the program to ensure the integrity of signal reception and finally realize the task of channel triggering and channel reception.
2.2. Pronunciation Signal Preprocessing
After the high-sensitivity acoustic wave sensor calculates the ratio of signal input to output, the system function can be obtained by pulling and transforming the comparison value. The acoustic wave sensor is designed by the impulse response method, and the general form of the function for pronunciation error correction is where is the acoustic wave sensor coefficient of the th state at time ; is the cumulative output probability of the th state at time ; is the previous state number of the th state at time ; is the optimal state sequence at time status.
The logarithm of the posterior probability of the phoneme in the th segment of pronunciation for each meal of the English pronunciation error signal is taken, and then, the logarithmic posterior probability score of the phoneme under the th segment of pronunciation can be obtained: where is the duration of the th time period corresponding to phone ; is the normalized function of the th time period of phone ; is the likelihood of the segment of the th time period of phone ; is the final output probability.
The sound wave sensor-assisted automatic correction system regards English pronunciation errors as a common pronunciation classification problem and uses a classification model to solve this problem. This model is based on a four-layer feed-forward network, which includes a pronunciation vector mapping table; the formula for inputting the input layer vector into the feed-forward network for forward calculation is as follows: where is the network weight; is the bias value; is the activation function; is the output value of the corresponding layer; is the learning rate; is the dimension of the pronunciation error; is the size of the vector table of the pronunciation error.
English pronunciation error preprocessing includes sampling of English pronunciation errors, antialiasing band-pass filtering to remove individual pronunciation differences and noise effects caused by equipment and environment. English pronunciation error is an unstable random process, so it needs to use high-sensitivity acoustic wave sensor for short-term processing and involves primitive selection and endpoint detection of pronunciation recognition [10]. Endpoint detection refers to determining the start and end of pronunciation from English pronunciation errors, which is an important part of preprocessing. The process of pronunciation recognition is a process of digitally processing English pronunciation errors. Before processing English pronunciation errors, they must be digitally processed, and this process is analog-to-digital conversion. The analog-to-digital conversion process has to go through two processes, sampling and quantization, to obtain discrete digital signals in time and amplitude, and preemphasis is usually performed before transformation and after antialiasing filtering. After the system obtains learner’s follow-up pronunciation, it extracts its characteristics and calculates the similarity between it and the standard pronunciation in the test question bank and finally maps the similarity to a grade score that is easier for the learner to understand and accept.
3. Hardware Design of Automatic Correction System Based on High-Sensitivity Acoustic Wave Sensors
3.1. Optimization Design of Pronunciation Recognition Sensors
In order to ensure the accuracy, reliability, unity, and self-adaptability of English pronunciation errors correction and to adapt to the development trend of automatic correction, the system hardware design must carry out effective measurement supervision on the accuracy and reliability of the sound wave sensor’s measurement value transmission to standardize and perfect the calibration of the sensor. The main components include raster data conditioning module, sensor output conditioning module to be calibrated, and acquisition device and computer system. This module can, respectively, realize the correction of the amplitude linearity, sensitivity, repeatability error, and return error of English pronunciation errors, and has the functions of automatic real-time data collection, data processing, storage, query, and remeasurement. The grating ruler is converted into the corresponding electrical signal through its conditioning circuit, and the corresponding processing is carried out by the formant acquisition card and the English pronunciation signal is input into the system. The sensor to be calibrated outputs the corresponding voltage or current through its conditioning circuit, through data acquisition device, use the interface to achieve serial communication, set the data acquisition device in the reset state, establish the trigger condition, and initialize the control settings; the model can enter the working state, open the serial port, input the output signal into the computer system, and finally, respond with the collected data analysis and processing [11]. Figure 1 shows the automatic correction system design framework for English pronunciation errors assisted by high-sensitivity acoustic wave sensors.
... Minematsu et al. [186] extend this to six-way classification, also distinguishing between sentence stresses marking the beginning and end of a phrase. Lee et al. [160] perform lexical stress detection first, then combine the stress features of the stressed syllable of each word with lexical and syntactic features, specifically its identity w i , a part of speech (POS) tag and class tag (function word vs. content word) obtained from a sentence analyser [35] and the number of vowels and syllables it contains. The combined feature vector for each word is passed, along with the vectors for the two preceding and three following words, through a linear chain Conditional Random Field (CRF) classifier, trained on a stress-annotated corpus, to detect whether each word is sentence-stressed. ...
Growing global demand for learning a second language (L2), particularly English, has led to considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications. This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One of the challenges in automatic spoken language assessment is giving candidates feedback on particular aspects, or views, of their spoken language proficiency, in addition to the overall holistic score normally provided. Another is detecting pronunciation and other types of errors at the word or utterance level and feeding them back to the learner in a useful way. It is usually difficult to obtain accurate training data with separate scores for different views and, as examiners are often trained to give holistic grades, single-view scores can suffer issues of consistency. Conversely, holistic scores are available for various standard assessment tasks such as Linguaskill. An investigation is thus conducted into whether assessment scores linked to particular views of the speaker’s ability can be obtained from systems trained using only holistic scores. End-to-end neural systems are designed with structures and forms of input tuned to single views, specifically each of pronunciation, rhythm, intonation and text. By training each system on large quantities of candidate data, individual-view information should be possible to extract. The relationships between the predictions of each system are evaluated to examine whether they are, in fact, extracting different information about the speaker. Three methods of combining the systems to predict holistic score are investigated, namely averaging their predictions and concatenating and attending over their intermediate representations. The combined graders are compared to each other and to baseline approaches. The tasks of error detection and error tendency diagnosis become particularly challenging when the speech in question is spontaneous and particularly given the challenges posed by the inconsistency of human annotation of pronunciation errors. An approach to these tasks is presented by distinguishing between lexical errors, wherein the speaker does not know how a particular word is pronounced, and accent errors, wherein the candidate’s speech exhibits consistent patterns of phone substitution, deletion and insertion. Three annotated corpora x of non-native English speech by speakers of multiple L1s are analysed, the consistency of human annotation investigated and a method presented for detecting individual accent and lexical errors and diagnosing accent error tendencies at the speaker level.
... Prosody and suprasegmental related pronunciation phenomena are important for learning native-like speaking skills. Some research has paid attention to language learners' lexical stress [1,2], intonation [3], and vowel reduction [4]. ...
... We take advantage of the hierarchical attention mechanism. In detail, context vectors, r 1 l and r 2 l from the main and auxiliary encoders are computed in a similar way to Equation (2). The fusion vector containing multi-stream information is obtained as a combination of r 1 l and r 2 l as follows: ...
Vowel reduction is a common pronunciation phenomenon in stress-timed languages like English. Native speakers tend to weaken unstressed vowels into a schwa-like sound. It is an essential factor that makes the accent of language learners sound unnatural. To improve vowel reduction detection in a phoneme recognition framework, we propose an end-to-end vowel reduction detection method that introduces pronunciation prior knowledge as auxiliary information. In particular, we have designed two methods for automatically generating pronunciation prior sequences from reference texts and have implemented a main and auxiliary encoder structure that uses hierarchical attention mechanisms to utilize the pronunciation prior information and acoustic information dynamically. In addition, we also propose a method to realize the feature enhancement after encoding by using the attention mechanism between different streams to obtain expanded multi-streams. Compared with the HMM-DNN hybrid method and the general end-to-end method, the average F1 score of our approach for the two types of vowel reduction detection increased by 8.8% and 6.9%, respectively. The overall phoneme recognition rate increased by 5.8% and 5.0%, respectively. The experimental part further analyzes why the pronunciation prior knowledge auxiliary input is effective and the impact of different pronunciation prior knowledge types on performance.
... Piske et al [2] summarized some of these factors, including age of L2 learning, length of residence in an L2-speaking country, gender, formal instruction, motivation, language learning aptitude and amount of native language. Phonologically, a bulk of research has also examined the influence of segmental and suprasegmental features on the degree of perceived foreign accent [3,4,5,6]. However, interest has mainly focused on segmental deviations from native pronunciation, due to the complexity of prosody and its pedagogical challenges [7]. ...
... However, interest has mainly focused on segmental deviations from native pronunciation, due to the complexity of prosody and its pedagogical challenges [7]. Despite the scarcity in the investigation of perceived accent transferred from suprasegmental factors, it has been argued in some research that prosody also significantly contributes to an overall impression of foreign accent [5]. ...
... Improper sentence stress production is also one primary cause for perceived foreign accent. Studies have corroborated that reduction in sentence stress errors can facilitate non-native learners' production of English rhythm in a more comprehensible and nativelike way [5,8,9]. Nevertheless, many Chinese learners of English suffer from the complexity of English prosody, due to their lack of awareness of the striking differences between their L1 and the target L2. ...
... Sentence stress can form a certain natural stress pattern characteristic for a given language and give emphasis on particular words based on their relative importance. Sentence stress is different from pitch accent that carries pitch prominence caused by an intonation event as well as rhythmic prominence caused by sentence stress [5]. ...
... The standard approach for prosody annotation is based on Tone and Break Indices (ToBI) [25]. As ToBI focuses on pitch accent which is not totally equal to sentence stress, we use the Aix-MARSEC (Aix-Machine Readable Spoken English Corpus) database [26] as in [5]. Aix-MARSEC consists of over 5 hours of BBC radio recordings from 53 different speakers in 11 different speech styles from the 1980s. ...
... We uses the original phrase break annotations for minor and major boundaries which are equivalent with the break indices 3 and 4 in ToBI [27]. We follow the previous work which treated the syllable to be stressed when first appearing in each Jassems narrow rhythm unit (NRU) notation [5]. For practical purposes, we merge minor and major boundaries into break labels. ...
Prosodic event detection plays an important role in spoken language processing tasks and Computer-Assisted Pronunciation Training (CAPT) systems [1]. Traditional methods for the detection of sentence stress and phrase boundaries rely on machine learning methods that model limited contextual information and account little for interaction between these two prosodic events. In this paper, we propose a hierarchical network modeling the contextual factors at the granularity of phoneme, syllable and word based on bidirectional Long Short-Term Memory (BLSTM). Moreover, to account for the inherent connection between sentence stress and phrase boundaries, we perform a joint modeling of these two important prosodic events with a multitask learning framework (MTL) which shares common prosodic features. We evaluate the network performance based on Aix-Machine Readable Spoken English Corpus (Aix-MARSEC). Experimental results show our proposed method obtains the F1-measure of 90% for sentence stress detection and 91% for phrase boundary detection, which outperforms the baseline utilizing conditional random field (CRF) by about 4% and 9% respectively.
... With the function of stress prediction and detection, learners can clearly notice whether they put the stress in the correct place, and if they realize it in the proper pitch pattern without any delay. The results of this experiment show that learners' accuracy on ryhthm and fluency are improved after training by this system [6]. ...
Abstract——The present study aims to investigate how
multimodal training method contribute to the improvement of
the L2 intonation produced by Chinese EFL learners.
Altogether 75 learners with an English major background from
3 different dialectal regions of China are recruited. They are
divided into 5 groups which differ from each other in training
methods, which specifically are the control group (G1), group
with sound for training only (G2), group with sound and after�training feedback (G3), group with both audio and visual
material for training (G4), and the audiovisual training group
with feedback (G5). The results show that although no
significant improvement between learners’ pretest and posttest
for each group, still we observe that some of the learners in
experiment groups score significantly higher in posttest than
those in the control group, and among them, G5 is the best as
the most cases of intonation are improved through the training.
This indicates that multimodal + supervised training method is
the most effective way in L2 intonation teaching in this
experiment. Unobvious improvement of in the rest cases might
due to the limited training time, which will be further
ameliorated by a supplementary intensive training in this
method.
... Sentence stress can form a certain natural stress pattern characteristic for a given language and give emphasis on particular words based on their relative importance. Sentence stress is different from pitch accent that carries pitch prominence caused by an intonation event as well as rhythmic prominence caused by sentence stress [5]. ...
... The standard approach for prosody annotation is based on Tone and Break Indices (ToBI) [25]. As ToBI focuses on pitch accent which is not totally equal to sentence stress, we use the Aix-MARSEC (Aix-Machine Readable Spoken English Corpus) database [26] as in [5]. Aix-MARSEC consists of over 5 hours of BBC radio recordings from 53 different speakers in 11 different speech styles from the 1980s. ...
... We uses the original phrase break annotations for minor and major boundaries which are equivalent with the break indices 3 and 4 in ToBI [27]. We follow the previous work which treated the syllable to be stressed when first appearing in each Jassems narrow rhythm unit (NRU) notation [5]. For practical purposes, we merge minor and major boundaries into break labels. ...