Article

Computer-Based Fluency Evaluation of English Speaking Tests for Koreans

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper, we propose an automatic fluency evaluation algorithm for English speaking tests. In the proposed algorithm, acoustic features are extracted from an input spoken utterance and then fluency score is computed by using support vector regression (SVR). We estimate the parameters of feature modeling and SVR using the speech signals and the corresponding scores by human raters. From the correlation analysis results, it is shown that speech rate, articulation rate, and mean length of runs are best for fluency evaluation. Experimental results show that the correlation between the human score and the SVR score is 0.87 for 3 speaking tests, which suggests the possibility of the proposed algorithm as a secondary fluency evaluation tool.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Jang and Kwon recently proposed a method for fluency and pronunciation evaluation using an aligner in case when the transcribed text is given [10]. But, it could not evaluate the fluency of free-talking utterances if the transcribed text is not available. ...
... As in the previous paper [10], we selected SVR as a predictor for fluency score because SVR is known to achieve good generalization performance by virtue of a nonlinear prediction function [18]- [21]. As shown in <Figure 3>, the basic goal of SVR is to find a function f (x) that has at most ɛ deviation from the actually obtained targets [21]. ...
... The pauses here do not consider the ones between sentences but the ones inside the sentences. To avoid the problem due to hard clipping on pause duration and obtain the SUPR in a continuous value, a sigmoid function was applied to the duration of silent segments[10]. The lenUP is related to the length of unfilled pauses, and calculated as the total duration of unfilled pauses divided by the number of unfilled pauses[3]. ...
Article
We propose a new method for automatic fluency scoring of English speaking tests spoken by nonnative speakers in a free-talking style. The proposed method is different from the previous methods in that it does not require the transcribed texts for spoken utterances. At first, an input utterance is segmented into a phone sequence by using a phone recognizer trained by using native speech databases. For each utterance, a feature vector with 6 features is extracted by processing the segmentation results of the phone recognizer. Then, fluency score is computed by applying support vector regression (SVR) to the feature vector. The parameters of SVR are learned by using the rater scores for the utterances. In computer experiments with 3 tests taken by 48 Korean adults, we show that speech rate, phonation time ratio, and smoothed unfilled pause rate are best for fluency scoring. The correlation of between the rater score and the SVR score is shown to be 0.84, which is higher than the correlation of 0.78 among raters. Although the correlation is slightly lower than the correlation of 0.90 when the transcribed texts are given, it implies that the proposed method can be used as a preprocessing tool for fluency evaluation of speaking tests.
Book
Full-text available
HTK is a toolkit for building Hidden Markov Models (HMMs). HMMs can be used to model any time series and the core of HTK is similarly general-purpose. However, HTK is primarily designed for building HMM-based speech processing tools, in particular recognisers. Thus, much of the infrastructure support in HTK is dedicated to this task. As shown in the picture above, there are two major processing stages involved. Firstly, the HTK training tools are used to estimate the parameters of a set of HMMs using training utterances and their associated transcriptions. Secondly, unknown utterances are transcribed using the HTK recognition tools.
Article
There have been few attempts to specify precisely what fluency is, although it is a term that is used and understood by both laypeople and linguists. With one objective being to achieve a greater understanding of what comprises fluency, this study explores the speech of six nonnative speakers of English who had been rated by English instructors as either “fluent” or “nonfluent.” Excerpts of audiotaped dialogues were analyzed, both at the utterance level and at the discourse level, in terms of the frequency and possible function of features that have often been ascribed to fluency (hesitation phenomena, repair, and rate of speech). More functionally oriented pragmatic features that relate to topic control and initiation were also examined.Comparison of subgroups on the basis of perceived fluency or lack of fluency provided few statistically significant results, although the findings of qualitative analyses suggest that fluency is a complex, high‐order linguistic phenomenon and that intuitive judgments about fluency level—such as those made by the raters for this study—may take into account a wide range of linguistic phenomena. Thus, this study offers an innovative approach to understanding fluency, with possibly implications for the teaching and testing of languages, as well as for research on second language acquisition processes.
Chapter
Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an e insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regularization parameters in these models. Two applications are considered: data from (a) a noisy (normal and uniform noise) Mackey Glass equation and (b) the Santa Fe competition (set D). In both cases Support Vector Machines show an excellent performance. In case (b) the Support Vector approach improves the best known result on the benchmark by a factor of 29%.
Conference Paper
This paper presents a new technique of cepstral analysis synthesis on the mel frequency scale, the log spectrum on the mel frequency scale (the mel log spectrum) is considered to be an effective representation of the spectral envelope of speech. This analysis synthesis system uses the mel log spectrum approximation (MLSA) filter which was devised for the cepstral synthesis on the mel frequency scale. The filter coefficients are easily obtained through a simple linear transform from the mel cepstrum defined as the Fourier cosine coefficients of the mel log spectral envelope of speech. The MLSA filter has a low coefficient sensitivity and a good coefficient quantization characteristics. The spectral distortion caused by interpolation of the filter parameters of two successive frames is small. Accordingly, the data rate of this system is very low. The same quality speech is synthesized at 60-70 % of data rates in the conventional cepstral vocoder or the LPC vocoder.
Article
We present a paradigm for the automatic assessment of pronunciation quality by machine. In this scoring paradigm, both native and nonnative speech data is collected and a database of human-expert ratings is created to enable the development of a variety of machine scores. We first discuss issues related to the design of speech databases and the reliability of human ratings. We then address pronunciation evaluation as a prediction problem, trying to predict the grade a human expert would assign to a particular skill. Using the speech and the expert-ratings databases, we build statistical models and introduce different machine scores that can be used as predictor variables. We validate these machine scores on the Voice Interactive Language Training System (VILTS) corpus, evaluating the pronunciation of American speakers speaking French and we show that certain machine scores, like the log-posterior and the normalized duration, achieve a correlation with the targeted human grades that is comparable to the human-to-human correlation when a sufficient amount of speech data is available.
Article
The research reported in this paper explores which variables predict native and non-native speaking teachers' perception of fluency and distinguish fluent from non-fluent L2 learners. In addition to traditional measures of the quality of students' output such as accuracy and lexical diversity, we investigated speech samples collected from 16 Hungarian L2 learners at two distinct levels of proficiency with the help of computer technology. The two groups of students were compared and their temporal and linguistic measures were correlated with the fluency scores they received from three experienced native and three non-native speaker teacher judges. The teachers' written comments concerning the students' performance were also taken into consideration. For all the native and non-native teachers, speech rate, the mean length of utterance, phonation time ratio and the number of stressed words produced per minute were the best predictors of fluency scores. However, the raters differed as regards how much importance they attributed to accuracy, lexical diversity and the mean length of pauses. The number of filled and unfilled pauses and other disfluency phenomena were not found to influence perceptions of fluency.
Article
Fluency is a commonly used notion in foreign language teaching, frequently contrasted with accuracy especially in a communicative language teaching. In ordinary life it often has an extended meaning and is used as a synonym of overall oral proficiency. On the contrary, in the assessment of foreign language proficiency, it is one of several descriptors of oral performance. Despite the belief that we share a common definition as language teachers and researchers, there is some evidence that agreement cannot be taken for granted and various interpretations coexist. The purpose of this paper is to review recent research into the qualitative and quantitative aspects of fluency in order to arrive at a clearer definition of the word, both as a performance descriptor for oral assessment of foreign language learners and as an indicator of progress in language learning. It is suggested that research into temporal variables in speech production provides concrete evidence which can contribute to a more precise definition of fluency. However a purely quantitative definition of fluency does not enable us to discover how to facilitate efficient processes of speech productions. A qualitative, linguistic analysis of the language produced by advanced language learners reveals some of the links between linguistic knowledge and performance skills.
Conference Paper
The DARPA Spoken Language System (SLS) community has long taken a leadership position in designing, implementing, and globally distributing significant speech corpora widely used for advancing speech recognition research. The Wall Street Journal (WSJ) CSR Corpus described here is the newest addition to this valuable set of resources. In contrast to previous corpora, the WSJ corpus will provide DARPA its first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value. This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus.
Article
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Article
In this article, it will be argued that the proceduralization of linguistic knowledge is the most important factor in the development of fluency in advanced second language learners Levelt's (1989) model of language production is used to provide the descriptive base for the sub-processes of language production This posits the existence of a conceplualizer, a formulator, and an articulator, each of which contains procedural knowledge Levelt's model does not, however, deal with how that knowledge is developed It is proposed that Anderson's (1983) model of adaptive control of thought may be used to account for developmental aspects This posus that the learning process involves the conversion of declarative knowledge into procedural knowledge via cognitive, associative, and autonomous stages of compilation and tuning Neither Levelt nor Anderson, however, have stated how the contribution of the sub-processes or how the developmental stages may be measured in language use It is argued that the temporal variables used by Grosjean andDeschamps (1972, 1973, 1975) provide a way of fluency and (b) the contribution of the sub-processed in the model Evidece from 12 advanced learners of French and English is used to show how this may be done Initial results from expertments indicate that on a specific task learners became more fluent (as measured by speaking rate) as a result of the period of residence abroad and that an increase in mean length of run was the most important of the temporal variables contributing to this development It is argued that the increase in eman length of run is mainly attributable to the procedurallization of different kinds of knowledge, including procedural knowledge of syntax and of lexical phrases (Nattiger and DeCarnco 1992) The way in which this may have taken place is illustrated by means of extracts from the texts produced by the subjects We conclude that the quantitative and qualitative evidence supports the contention that increases in fluency are attributable mainly to increases in the degree of proceduralization of knowledge
On fluency. Individual differences in language ability and language behavior
  • C J Fillmore
Fillmore, C. J. (1979). On fluency. Individual differences in language ability and language behavior, 85-101.
HTK Wall Street Journal Training Recipe
  • K Vertanen
The CMU pronouncing dictionary
  • K Lenzo
Lenzo, K. (2007). The CMU pronouncing dictionary.
  • K Vertanen
Vertanen, K. (1994). HTK Wall Street Journal Training Recipe. http://www.keithv.com.